0000000001203171
AUTHOR
Jordi Muñoz-marí
Discovering single classes in remote sensing images with active learning
When dealing with supervised target detection, the acquisition of labeled samples is one of the most critical phases: the samples must be yet representative of the class of interest, but must also be found among a vast majority of non-target examples. Moreover, the efficiency of the search is also an issue, since the samples labeled as background are not used by target detectors such as the support vector data description (SVDD). In this work we propose a competitive and effective approach to identify the most relevant training samples for one-class classification based on the use of an active learning strategy. The SVDD classifier is first trained with insufficient target examples. It is t…
Physics-Aware Machine Learning For Geosciences And Remote Sensing
Machine learning models alone are excellent approximators, but very often do not respect the most elementary laws of physics, like mass or energy conservation, so consistency and confidence are compromised. In this paper we describe the main challenges ahead in the field, and introduce several ways to live in the Physics and machine learning interplay: encoding differential equations from data, constraining data-driven models with physics-priors and dependence constraints, improving parameterizations, emulating physical models, and blending data-driven and process-based models. This is a collective long-term AI agenda towards developing and applying algorithms capable of discovering knowled…
Down-Scaling Modis Vegetation Products with Landsat GAP Filled Surface Reflectance in Google Earth Engine
High spatial resolution vegetation products are fundamental in different fields, such as improving the understanding of crop seasonality at regional scales. Here, two new vegetation products such as the Leaf Area Index (LAI) and the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) are downscaled at continental scales. A novel HIghly Scalable Temporal Adaptive Reflectance Fusion Model (HIS-TARFM) is used to generate the gap-free time series of Landsat surface reflectance data by fusing MODIS and Landsat reflectance for the contiguous United States. An artificial neural network is trained to capture the relationship between the gap free Landsat surface reflectance and the MODI…
Statistical biophysical parameter retrieval and emulation with Gaussian processes
Abstract Earth observation from satellites poses challenging problems where machine learning is being widely adopted as a key player. Perhaps the most challenging scenario that we are facing nowadays is to provide accurate estimates of particular variables of interest characterizing the Earth's surface. This chapter introduces some recent advances in statistical bio-geophysical parameter retrieval from satellite data. In particular, we will focus on Gaussian process regression (GPR) that has excelled in parameter estimation as well as in modeling complex radiative transfer processes. GPR is based on solid Bayesian statistics and generally yields efficient and accurate parameter estimates, a…
Nonlinear Distribution Regression for Remote Sensing Applications
In many remote sensing applications, one wants to estimate variables or parameters of interest from observations. When the target variable is available at a resolution that matches the remote sensing observations, standard algorithms, such as neural networks, random forests, or the Gaussian processes, are readily available to relate the two. However, we often encounter situations where the target variable is only available at the group level, i.e., collectively associated with a number of remotely sensed observations. This problem setting is known in statistics and machine learning as multiple instance learning (MIL) or distribution regression (DR). This article introduces a nonlinear (kern…
Physics-Aware Gaussian Processes for Earth Observation
Earth observation from satellite sensory data pose challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression and other kernel methods have excelled in biophysical parameter estimation tasks from space. GP regression is based on solid Bayesian statistics, and generally yield efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations is available though. In this work, we review three GP models that respect and learn the physics of the underlying processes …
Multispectral high resolution sensor fusion for smoothing and gap-filling in the cloud
Remote sensing optical sensors onboard operational satellites cannot have high spectral, spatial and temporal resolutions simultaneously. In addition, clouds and aerosols can adversely affect the signal contaminating the land surface observations. We present a HIghly Scalable Temporal Adaptive Reflectance Fusion Model (HISTARFM) algorithm to combine multispectral images of different sensors to reduce noise and produce monthly gap free high resolution (30 m) observations over land. Our approach uses images from the Landsat (30 m spatial resolution and 16 day revisit cycle) and the MODIS missions, both from Terra and Aqua platforms (500 m spatial resolution and daily revisit cycle). We implem…
Putting the user into the active learning loop : Towards realistic but efficient photointerpretation
In recent years, several studies have been published about the smart definition of training set using active learning algorithms. However, none of these works consider the contradiction between the active learning methods, which rank the pixels according to their uncertainty, and the confidence of the user in labeling, which is related both to the homogeneity of the pixel context and to the knowledge of the user of the scene. In this paper, we propose a two-steps procedure based on a filtering scheme to learn the confidence of the user in labeling. This way, candidate training pixels are ranked according both to their uncertainty and to the chances of being labeled correctly by the user. In…
Structured Output SVM for Remote Sensing Image Classification
Traditional kernel classifiers assume independence among the classification outputs. As a consequence, each misclassification receives the same weight in the loss function. Moreover, the kernel function only takes into account the similarity between input values and ignores possible relationships between the classes to be predicted. These assumptions are not consistent for most of real-life problems. In the particular case of remote sensing data, this is not a good assumption either. Segmentation of images acquired by airborne or satellite sensors is a very active field of research in which one tries to classify a pixel into a predefined set of classes of interest (e.g. water, grass, trees,…
Randomized kernels for large scale Earth observation applications
Abstract Current remote sensing applications of bio-geophysical parameter estimation and image classification have to deal with an unprecedented big amount of heterogeneous and complex data sources. New satellite sensors involving a high number of improved time, space and wavelength resolutions give rise to challenging computational problems. Standard physical inversion techniques cannot cope efficiently with this new scenario. Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in the last years to develop st…
Generation of global vegetation products from EUMETSAT AVHRR/METOP satellites
We describe the methodology applied for the retrieval of global LAI, FAPAR and FVC from Advanced Very High Resolution Radiometer (AVHRR) onboard the Meteorological-Operational (MetOp) polar orbiting satellites also known as EUMETSAT Polar System (EPS). A novel approach has been developed for the joint retrieval of three parameters (LAI, FVC, and FAPAR) instead of training one model per parameter. The method relies on multi-output Gaussian Processes Regression (GPR) trained over PROSAIL EPS simulations. A sensitivity analysis is performed to assess several sources of uncertainties in retrievals and maximize the positive impact of modeling the noise in training simulations. We describe the ma…
Web Monitoring System and Gateway for Serial Communication PLC
Abstract An industrial process requires interacting with the rest of the plant, being able to exchange data with other devices and monitoring systems in order to optimize production, reporting information and providing control capabilities to distant users. Internet, and, especially web browsers are an excellent tool to provide information for remote users, allowing not only monitoring but also controlling the industrial process as an SCADA software or HMI system. The proposed system does not need specific proprietary software and its associated license costs. In this work, a webserver system is implemented under a Freescale microcontroller, acting as a gateway for a simple PLC with single …
From Signal Processing to Machine Learning
This chapter reviews the main landmarks of signal processing in the 20th century from the perspective of algorithmic developments. It focuses on cross‐fertilization with the field of statistical (machine) learning in the last decades. In the 21st century, model and data assumptions as well as algorithmic constraints are no longer valid, and the field of machine‐learning signal processing has erupted, with many successful stories to tell. The chapter also focuses on digital signal processing (DSP), which deals with the analysis of digitized and discrete sampled signals. Machine learning is a branch of computer science and artificial intelligence that enables computers to learn from data. Mac…
Global Estimation of Soil Moisture Persistence with L and C-Band Microwave Sensors
© 2018 IEEE Measurements of soil moisture are needed for a better global understanding of the land surface-climate feedbacks at both the local and the global scale. Satellite sensors operating in the low frequency microwave spectrum (from 1 to 10 GHz) have proven to be suitable for soil moisture retrievals. These sensors now cover nearly 4 decades thus allowing for global multi-mission climate data records. In this paper, we assess the possibility of using L-band (SMOS) and C-band (AMSR2, ASCAT) remotely sensed soil moisture time series for the global estimation of soil moisture persistence. A multi-output Gaussian process regression model is applied to ensure spatio-temporal coverage of th…
Derivation of global vegetation biophysical parameters from EUMETSAT Polar System
Abstract This paper presents the algorithm developed in LSA-SAF (Satellite Application Facility for Land Surface Analysis) for the derivation of global vegetation parameters from the AVHRR (Advanced Very High Resolution Radiometer) sensor on board MetOp (Meteorological–Operational) satellites forming the EUMETSAT (European Organization for the Exploitation of Meteorological Satellites) Polar System (EPS). The suite of LSA-SAF EPS vegetation products includes the leaf area index (LAI), the fractional vegetation cover (FVC), and the fraction of absorbed photosynthetically active radiation (FAPAR). LAI, FAPAR, and FVC characterize the structure and the functioning of vegetation and are key par…
LABCENTER. A remote laboratory system platform
Abstract A web system server especially suited for remote laboratories has been developed. Typical e-learning systems do not offer the possibility to perform a remote laboratory where real experiments can be done online, accessing real hardware located at the University facilities. Allowing students to connect to hardware systems remotely provides them with additional knowledge about real devices; very often, real laboratory devices are time or space restricted. The proposed LABCENTER platform is a general frame designed for remote laboratories connection. The platform is designed to allow an authorized student to connect to hardware systems. As direct hardware systems allow only a single u…
Statistical Learning for End-to-End Simulations
End-to-end mission performance simulators (E2ES) are suitable tools to accelerate satellite mission development from concet to deployment. One core element of these E2ES is the generation of synthetic scenes that are observed by the various instruments of an Earth Observation mission. The generation of these scenes rely on Radiative Transfer Models (RTM) for the simulation of light interaction with the Earth surface and atmosphere. However, the execution of advanced RTMs is impractical due to their large computation burden. Classical interpolation and statistical emulation methods of pre-computed Look-Up Tables (LUT) are therefore common practice to generate synthetic scenes in a reasonable…
A Review of Kernel Methods in Remote Sensing Data Analysis
Kernel methods have proven effective in the analysis of images of the Earth acquired by airborne and satellite sensors. Kernel methods provide a consistent and well-founded theoretical framework for developing nonlinear techniques and have useful properties when dealing with low number of (potentially high dimensional) training samples, the presence of heterogenous multimodalities, and different noise sources in the data. These properties are particularly appropriate for remote sensing data analysis. In fact, kernel methods have improved results of parametric linear methods and neural networks in applications such as natural resource control, detection and monitoring of anthropic infrastruc…
Cloud screening with combined MERIS and AATSR images
This paper presents a cloud screening algorithm based on ensemble methods that exploits the combined information from both MERIS and AATSR instruments on board ENVISAT in order to improve current cloud masking products for both sensors. The first step is to analyze the synergistic use of MERIS and AATSR images in order to extract some physically-based features increasing the separability of clouds and surface. Then, several artificial neural networks are trained using different sets of input features and different sets of training samples depending on acquisition and surface conditions. Finally, outputs of the trained neural networks are combined at the decision level to construct a more ac…
Fair Kernel Learning
New social and economic activities massively exploit big data and machine learning algorithms to do inference on people’s lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient.
Cloud detection machine learning algorithms for PROBA-V
This paper presents the development and implementation of a cloud detection algorithm for Proba-V. Accurate and automatic detection of clouds in satellite scenes is a key issue for a wide range of remote sensing applications. With no accurate cloud masking, undetected clouds are one of the most significant sources of error in both sea and land cover biophysical parameter retrieval. The objective of the algorithms presented in this paper is to detect clouds accurately providing a cloud flag per pixel. For this purpose, the method exploits the information of Proba-V using statistical machine learning techniques to identify the clouds present in Proba-V products. The effectiveness of the propo…
Learning main drivers of crop progress and failure in Europe with interpretable machine learning
Abstract A wide variety of methods exist nowadays to address the important problem of estimating crop yields from available remote sensing and climate data. Among the different approaches, machine learning (ML) techniques are being increasingly adopted, since they allow exploiting all the information on crop progress and environmental conditions and their relations with crop yield, achieving reliable and accurate estimations. However, interpreting the relationships learned by the ML models, and hence getting insights about the problem, remains a complex and usually unexplored task. Without accountability, confidence and trust in the ML models can be compromised. Here, we develop interpretab…
An Emulator Toolbox to Approximate Radiative Transfer Models with Statistical Learning
Physically-based radiative transfer models (RTMs) help in understanding the processes occurring on the Earth’s surface and their interactions with vegetation and atmosphere. When it comes to studying vegetation properties, RTMs allows us to study light interception by plant canopies and are used in the retrieval of biophysical variables through model inversion. However, advanced RTMs can take a long computational time, which makes them unfeasible in many real applications. To overcome this problem, it has been proposed to substitute RTMs through so-called emulators. Emulators are statistical models that approximate the functioning of RTMs. Emulators are advantageous in real practice because…
Biophysical parameter retrieval with warped Gaussian processes
This paper focuses on biophysical parameter retrieval based on Gaussian Processes (GPs). Very often an arbitrary transformation is applied to the observed variable (e.g. chlorophyll content) to better pose the problem. This standard practice essentially tries to linearize/uniformize the distribution by applying non-linear link functions like the logarithmic, the exponential or the logistic functions. In this paper, we propose to use a GP model that automatically learns the optimal transformation directly from the data. The so-called warped GP regression (WGPR) presented in [1] models output observations as a parametric nonlinear transformation of a GP. The parameters of such prior model are…
Global Upscaling of the MODIS Land Cover with Google Earth Engine and Landsat Data
Image classification has become one of the most common applications in remote sensing yielding to the creation of a variety of operational thematic maps at multiple spatio-temporal scales. The information contained in these maps summarizes key characteristics related with the physical environment and provides fundamental information of the Earth for vegetation monitoring or land use status over time. However, high spatial resolution land cover maps are usually only produced for specific small regions or in an image tile. We present a general methodology to obtain a high spatial resolution land cover maps using Landsat spectral information, the powerful Google Earth Engine platform, and oper…
A Support Vector Machine Signal Estimation Framework
Support vector machine (SVM) were originally conceived as efficient methods for pattern recognition and classification, and the SVR was subsequently proposed as the SVM implementation for regression and function approximation. Nowadays, the SVR and other kernel‐based regression methods have become a mature and recognized tool in digital signal processing (DSP). This chapter starts to pave the way to treat all the problems within the field of kernel machines, and presents the fundamentals for a simple, framework for tackling estimation problems in DSP using support vector machine SVM. It outlines the particular models and approximations defined within the framework. The chapter concludes wit…
Synergistic integration of optical and microwave satellite data for crop yield estimation
Developing accurate models of crop stress, phenology and productivity is of paramount importance, given the increasing need of food. Earth observation (EO) remote sensing data provides a unique source of information to monitor crops in a temporally resolved and spatially explicit way. In this study, we propose the combination of multisensor (optical and microwave) remote sensing data for crop yield estimation and forecasting using two novel approaches. We first propose the lag between Enhanced Vegetation Index (EVI) derived from MODIS and Vegetation Optical Depth (VOD) derived from SMAP as a new joint metric combining the information from the two satellite sensors in a unique feature or des…
Cloud detection on the Google Earth engine platform
The vast amount of data acquired by current high resolution Earth observation satellites implies some technical challenges to be faced. Google Earth Engine (GEE) platform provides a framework for the development of algorithms and products built over this data in an easy and scalable manner. In this paper, we take advantage of the GEE platform capabilities to exploit the wealth of information in the temporal dimension by processing a long time series of satellite images. A cloud detection algorithm for Landsat-8, which uses previous images of the same location to detect clouds, is implemented and tested on the GEE platform.
Introduction to Digital Signal Processing
Signal processing deals with the representation, transformation, and manipulation of signals and the information they contain. Typical examples include extracting the pure signals from a mixture observation (a field commonly known as deconvolution) or particular signal (frequency) components from noisy observations (generally known as filtering). This chapter outlines the basics of signal processing and then introduces the more advanced concepts of time‐frequency and time‐scale representations, as well as emerging fields of compressed sensing and multidimensional signal processing. When moving to multidimensional signal processing, a modern approach is taken from the point of view of statis…
Active Learning Methods for Efficient Hybrid Biophysical Variable Retrieval
Kernel-based machine learning regression algorithms (MLRAs) are potentially powerful methods for being implemented into operational biophysical variable retrieval schemes. However, they face difficulties in coping with large training data sets. With the increasing amount of optical remote sensing data made available for analysis and the possibility of using a large amount of simulated data from radiative transfer models (RTMs) to train kernel MLRAs, efficient data reduction techniques will need to be implemented. Active learning (AL) methods enable to select the most informative samples in a data set. This letter introduces six AL methods for achieving optimized biophysical variable estimat…
Support Vector Machine and Kernel Classification Algorithms
This chapter introduces the basics of support vector machine (SVM) and other kernel classifiers for pattern recognition and detection. It also introduces the main elements and concept underlying the successful binary SVM. The chapter starts by introducing the main elements and concept underlying the successful binary SVM. Next, it introduces more advanced topics in SVM for classification, including large margin filtering (LMF), SSL, active learning, and large‐scale classification using SVMs. The LMF method performs both signal filtering and classification simultaneously by learning the most appropriate filters. SSL with SVMs exploits the information contained in both labeled and unlabeled e…
Biophysical parameter estimation with adaptive Gaussian Processes
We evaluate Gaussian Processes (GPs) for the estimation of biophysical parameters from acquired multispectral data. The standard GP formulation is used, and all hyperparameters (kernel parameters and noise variance) are optimized by maximizing the marginal likelihood. This gives rise to a fully-adaptive GP to data characteristics, both in terms of signal and noise properties. The good numerical results in the estimation of oceanic chlorophyll concentration and leaf membrane state confirm GPs as adequate, alternative non-parametric methods for biophysical parameter estimation. GPs are also analyzed by scrutinizing the predictive variance, the estimated noise variance, and the relevance of ea…
Hyperspectral dimensionality reduction for biophysical variable statistical retrieval
Abstract Current and upcoming airborne and spaceborne imaging spectrometers lead to vast hyperspectral data streams. This scenario calls for automated and optimized spectral dimensionality reduction techniques to enable fast and efficient hyperspectral data processing, such as inferring vegetation properties. In preparation of next generation biophysical variable retrieval methods applicable to hyperspectral data, we present the evaluation of 11 dimensionality reduction (DR) methods in combination with advanced machine learning regression algorithms (MLRAs) for statistical variable retrieval. Two unique hyperspectral datasets were analyzed on the predictive power of DR + MLRA methods to ret…
Adaptive Kernel Learning for Signal Processing
Adaptive filtering is a central topic in digital signal processing (DSP). By applying linear adaptive filtering principles in the kernel feature space, powerful nonlinear adaptive filtering algorithms can be obtained. This chapter introduces the wide topic of adaptive signal processing, and explores the emerging field of kernel adaptive filtering (KAF). In many signal processing applications, the problem of signal estimation is addressed. Probabilistic models have proven to be very useful in this context. The chapter discusses two families of kernel adaptive filters, namely kernel least mean squares (KLMS) and kernel recursive least‐squares (KRLS) algorithms. In order to design a practical …
Estrategia de enseñanza y aprendizaje de programación basada en la idea de ’hackathon’
[EN] The acquisition of programming and data analysis skills in higher education is increa-singly necessary in all areas of Science and Engineering. In this paper we present a methodology for the motivation of programming learning, mainly focused on the deve-lopment of machine learning algorithms. This methodology is based on the hackathon idea and will have different levels. On the one hand the basic level where a competition is proposed in an improvised way during the development of the class. A second level where a programmed hackathon is proposed but within the classroom environment and using learning management systems such as Moodle. The last level consists of parti-cipation in an exte…
Operational cloud screening service for Sentinel-2 image time series
This paper deals with the development and implementation of a cloud screening algorithm for image time series, with the focus on the forthcoming Sentinel-2 satellites to be launched under the ESA Copernicus Programme. The proposed methodology is based on kernel ridge regression and exploits the temporal information to detect anomalous changes that correspond to cloud covers. The huge data volumes to be processed when dealing with high temporal, spatial, and spectral resolution datasets motivate the implementation of the algorithm within distributed computer resources. In consequence, an operational cloud screening service has been specifically designed and implemented in the frame of the Se…
A Real time locating system for local fleet management
Abstract Local locating systems are necessary where a big number of units are handled. In this case, we propose a management system for parking spaces for vehicles that are distributed among eight different-sized lots on a 1.8 million square meter premises. In order to quickly locate each particular vehicle in this area, a new real time location system (RTLS) has been successfully installed. The vehicle location system Moby-R automatically monitors every vehicle movement and tracks its current parked position. All the vehicles are given a data carrier which is attached to the inside rear-view mirror with a special hanger, precisely locating the vehicle within approximately five meters
Kernel-Based Framework for Multitemporal and Multisource Remote Sensing Data Classification and Change Detection
The multitemporal classification of remote sensing images is a challenging problem, in which the efficient combination of different sources of information (e.g., temporal, contextual, or multisensor) can improve the results. In this paper, we present a general framework based on kernel methods for the integration of heterogeneous sources of information. Using the theoretical principles in this framework, three main contributions are presented. First, a novel family of kernel-based methods for multitemporal classification of remote sensing images is presented. The second contribution is the development of nonlinear kernel classifiers for the well-known difference and ratioing change detectio…
Cloud masking and removal in remote sensing image time series
Automatic cloud masking of Earth observation images is one of the first required steps in optical remote sensing data processing since the operational use and product generation from satellite image time series might be hampered by undetected clouds. The high temporal revisit of current and forthcoming missions and the scarcity of labeled data force us to cast cloud screening as an unsupervised change detection problem in the temporal domain. We introduce a cloud screening method based on detecting abrupt changes along the time dimension. The main assumption is that image time series follow smooth variations over land (background) and abrupt changes will be mainly due to the presence of clo…
Machine learning information fusion in Earth observation: A comprehensive review of methods, applications and data sources
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant i…
Retrieval of oceanic chlorophyll concentration with relevance vector machines
Abstract In this communication, we evaluate the performance of the relevance vector machine (RVM) for the estimation of biophysical parameters from remote sensing data. For illustration purposes, we focus on the estimation of chlorophyll-a concentrations from remote sensing reflectance just above the ocean surface. A variety of bio-optical algorithms have been developed to relate measurements of ocean radiance to in situ concentrations of phytoplankton pigments, and ultimately most of these algorithms demonstrate the potential of quantifying chlorophyll-a concentrations accurately from multispectral satellite ocean color data. Both satellite-derived data and in situ measurements are subject…
Nonlinear statistical retrieval of surface emissivity from IASI data
Emissivity is one of the most important parameters to improve the determination of the troposphere properties (thermodynamic properties, aerosols and trace gases concentration) and it is essential to estimate the radiative budget. With the second generation of infrared sounders, we can estimate emissivity spectra at high spectral resolution, which gives us a global view and long-term monitoring of continental surfaces. Statistically, this is an ill-posed retrieval problem, with as many output variables as inputs. We here propose nonlinear multi-output statistical regression based on kernel methods to estimate spectral emissivity given the radiances. Kernel methods can cope with high-dimensi…
Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods – A comparison
Abstract Given the forthcoming availability of Sentinel-2 (S2) images, this paper provides a systematic comparison of retrieval accuracy and processing speed of a multitude of parametric, non-parametric and physically-based retrieval methods using simulated S2 data. An experimental field dataset (SPARC), collected at the agricultural site of Barrax (Spain), was used to evaluate different retrieval methods on their ability to estimate leaf area index (LAI). With regard to parametric methods, all possible band combinations for several two-band and three-band index formulations and a linear regression fitting function have been evaluated. From a set of over ten thousand indices evaluated, the …
Quantifying uncertainty in high resolution biophysical variable retrieval with machine learning
The estimation of biophysical variables is at the core of remote sensing science, allowing a close monitoring of crops and forests. Deriving temporally resolved and spatially explicit maps of parameters of interest has been the subject of intense research. However, deriving products from optical sensors is typically hampered by cloud contamination and the trade-off between spatial and temporal resolutions. In this work we rely on the HIghly Scalable Temporal Adaptive Reflectance Fusion Model (HISTARFM) algorithm to generate long gap-free time series of Landsat surface reflectance data by fusing MODIS and Landsat reflectances. An artificial neural network is trained on PROSAIL inversion to p…
Fusing optical and SAR time series for LAI gap filling with multioutput Gaussian processes
The availability of satellite optical information is often hampered by the natural presence of clouds, which can be problematic for many applications. Persistent clouds over agricultural fields can mask key stages of crop growth, leading to unreliable yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery which can potentially overcome this limitation, but given its high and distinct sensitivity to different surface properties, the fusion of SAR and optical data still remains an open challenge. In this work, we propose the use of Multi-Output Gaussian Process (MOGP) regression, a machine learning technique that learns automatically the statistical relationships among…
Fair Kernel Learning
New social and economic activities massively exploit big data and machine learning algorithms to do inference on people's lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient. We present novel fai…
Machine Learning Methods for Spatial and Temporal Parameter Estimation
Monitoring vegetation with satellite remote sensing is of paramount relevance to understand the status and health of our planet. Accurate and constant monitoring of the biosphere has large societal, economical, and environmental implications, given the increasing demand of biofuels and food by the world population. The current democratization of machine learning, big data, and high processing capabilities allow us to take such endeavor in a decisive manner. This chapter proposes three novel machine learning approaches to exploit spatial, temporal, multi-sensor, and large-scale data characteristics. We show (1) the application of multi-output Gaussian processes for gap-filling time series of…
Gap Filling of Biophysical Parameter Time Series with Multi-Output Gaussian Processes
In this work we evaluate multi-output (MO) Gaussian Process (GP) models based on the linear model of coregionalization (LMC) for estimation of biophysical parameter variables under a gap filling setup. In particular, we focus on LAI and fAPAR over rice areas. We show how this problem cannot be solved with standard single-output (SO) GP models, and how the proposed MO-GP models are able to successfully predict these variables even in high missing data regimes, by implicitly performing an across-domain information transfer.
Inferring causation from time series in earth system sciences
The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods beyond the commonly adopted correlation techniques. Here, we give an overview of causal inference frameworks and identify promising generic application cases common in Earth system sciences and beyond. We discuss challenges and initiate the benchmark platform causeme.net to close the gap between method users and developers.
HyperLabelMe : A Web Platform for Benchmarking Remote-Sensing Image Classifiers
HyperLabelMe is a web platform that allows the automatic benchmarking of remote-sensing image classifiers. To demonstrate this platform's attributes, we collected and harmonized a large data set of labeled multispectral and hyperspectral images with different numbers of classes, dimensionality, noise sources, and levels. The registered user can download training data pairs (spectra and land cover/use labels) and submit the predictions for unseen testing spectra. The system then evaluates the accuracy and robustness of the classifier, and it reports different scores as well as a ranked list of the best methods and users. The system is modular, scalable, and ever-growing in data sets and clas…
Upport vector machines for nonlinear kernel ARMA system identification.
Nonlinear system identification based on support vector machines (SVM) has been usually addressed by means of the standard SVM regression (SVR), which can be seen as an implicit nonlinear autoregressive and moving average (ARMA) model in some reproducing kernel Hilbert space (RKHS). The proposal of this letter is twofold. First, the explicit consideration of an ARMA model in an RKHS (SVM-ARMA 2k) is proposed. We show that stating the ARMA equations in an RKHS leads to solving the regularized normal equations in that RKHS, in terms of the autocorrelation and cross correlation of the (nonlinearly) transformed input and output discrete time processes. Second, a general class of SVM-based syste…
Learning Structures in Earth Observation Data with Gaussian Processes
Gaussian Processes (GPs) has experienced tremendous success in geoscience in general and for bio-geophysical parameter retrieval in the last years. GPs constitute a solid Bayesian framework to formulate many function approximation problems consistently. This paper reviews the main theoretical GP developments in the field. We review new algorithms that respect the signal and noise characteristics, that provide feature rankings automatically, and that allow applicability of associated uncertainty intervals to transport GP models in space and time. All these developments are illustrated in the field of geoscience and remote sensing at a local and global scales through a set of illustrative exa…
Global Cropland Yield Monitoring with Gaussian Processes
Agriculture monitoring, and in particular food security, requires near real-time information on crop growing conditions for early detection of possible production deficits. In this work, we propose the use of Gaussian processes (GPs). together with in-situ, EO and ERA-Interim climate reanalysis data for crop yield forecasting. Country-level agricultural survey data from FAOSTAT are used for quantitative assessment. The study is conducted in the framework of the ASAP (Anomaly hot Spots of Agricultural Production) early warning decision support system of the European Commission, which aims at providing timely information about possible crop production anomalies worldwide. After grouping count…
Remote sensing data for crop yield in CONUS
I) SUMMARY This database contains harmonized time series for the study of crop yields using remote sensing data and meteorological data. We collected information on soybean, corn, and wheat yields (t/ha) over the CONUS (continuous US) from USDA-NASS for years 2015–2018 at a county level, and collocated time series for the following variables: Enhanced Vegetation Index (EVI) from MODIS satellite (MOD13C1 v6 product) Soil Moisture (SM) from SMAP satellite through MT-DCA algorithm Vegetation Optical Depth (VOD) from SMAP satellite through MT-DCA algorithm Maximum temperature (TMAX) from Daymet v3 Precipitation (PRCP) from Daymet v3 II) CONTACT For questions, please email Laura Mart&iacut…