0000000001203169
AUTHOR
Valero Laparra
Spatial noise-aware temperature retrieval from infrared sounder data
In this paper we present a combined strategy for the retrieval of atmospheric profiles from infrared sounders. The approach considers the spatial information and a noise-dependent dimensionality reduction approach. The extracted features are fed into a canonical linear regression. We compare Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF) for dimensionality reduction, and study the compactness and information content of the extracted features. Assessment of the results is done on a big dataset covering many spatial and temporal situations. PCA is widely used for these purposes but our analysis shows that one can gain significant improvements of the error rates when using…
Disentangling Derivatives, Uncertainty and Error in Gaussian Process Models
Gaussian Processes (GPs) are a class of kernel methods that have shown to be very useful in geoscience applications. They are widely used because they are simple, flexible and provide very accurate estimates for nonlinear problems, especially in parameter retrieval. An addition to a predictive mean function, GPs come equipped with a useful property: the predictive variance function which provides confidence intervals for the predictions. The GP formulation usually assumes that there is no input noise in the training and testing points, only in the observations. However, this is often not the case in Earth observation problems where an accurate assessment of the instrument error is usually a…
PerceptNet: A Human Visual System Inspired Neural Network for Estimating Perceptual Distance
Traditionally, the vision community has devised algorithms to estimate the distance between an original image and images that have been subject to perturbations. Inspiration was usually taken from the human visual perceptual system and how the system processes different perturbations in order to replicate to what extent it determines our ability to judge image quality. While recent works have presented deep neural networks trained to predict human perceptual quality, very few borrow any intuitions from the human visual system. To address this, we present PerceptNet, a convolutional neural network where the architecture has been chosen to reflect the structure and various stages in the human…
Visual Cortex Performs a Sort of Non-linear ICA
Here, the standard V1 cortex model optimized to reproduce image distortion psychophysics is shown to have nice statistical properties, e.g. approximate factorization of the PDF of natural images. These results confirm the efficient encoding hypothesis that aims to explain the organization of biological sensors by information theory arguments.
Kernel Anomalous Change Detection for Remote Sensing Imagery
Anomalous change detection (ACD) is an important problem in remote sensing image processing. Detecting not only pervasive but also anomalous or extreme changes has many applications for which methodologies are available. This paper introduces a nonlinear extension of a full family of anomalous change detectors. In particular, we focus on algorithms that utilize Gaussian and elliptically contoured (EC) distribution and extend them to their nonlinear counterparts based on the theory of reproducing kernels' Hilbert space. We illustrate the performance of the kernel methods introduced in both pervasive and ACD problems with real and simulated changes in multispectral and hyperspectral imagery w…
Statistical biophysical parameter retrieval and emulation with Gaussian processes
Abstract Earth observation from satellites poses challenging problems where machine learning is being widely adopted as a key player. Perhaps the most challenging scenario that we are facing nowadays is to provide accurate estimates of particular variables of interest characterizing the Earth's surface. This chapter introduces some recent advances in statistical bio-geophysical parameter retrieval from satellite data. In particular, we will focus on Gaussian process regression (GPR) that has excelled in parameter estimation as well as in modeling complex radiative transfer processes. GPR is based on solid Bayesian statistics and generally yields efficient and accurate parameter estimates, a…
Improved Statistically Based Retrievals via Spatial-Spectral Data Compression for IASI Data
In this paper, we analyze the effect of spatial and spectral compression on the performance of statistically based retrieval. Although the quality of the information is not com- pletely preserved during the coding process, experiments reveal that a certain amount of compression may yield a positive impact on the accuracy of retrievals. We unveil two strategies, both with interesting benefits: either to apply a very high compression, which still maintains the same retrieval performance as that obtained for uncompressed data; or to apply a moderate to high compression, which improves the performance. As a second contribution of this paper, we focus on the origins of these benefits. On the one…
Physics-Aware Gaussian Processes for Earth Observation
Earth observation from satellite sensory data pose challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression and other kernel methods have excelled in biophysical parameter estimation tasks from space. GP regression is based on solid Bayesian statistics, and generally yield efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations is available though. In this work, we review three GP models that respect and learn the physics of the underlying processes …
PRINCIPAL POLYNOMIAL ANALYSIS
© 2014 World Scientific Publishing Company. This paper presents a new framework for manifold learning based on a sequence of principal polynomials that capture the possibly nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by modeling the directions of maximal variance by means of curves instead of straight lines. Contrarily to previous approaches PPA reduces to performing simple univariate regressions which makes it computationally feasible and robust. Moreover PPA shows a number of interesting analytical properties. First PPA is a volume preserving map which in turn guarantees the existence of the inverse. Second such an inverse can be obtained…
Randomized kernels for large scale Earth observation applications
Abstract Current remote sensing applications of bio-geophysical parameter estimation and image classification have to deal with an unprecedented big amount of heterogeneous and complex data sources. New satellite sensors involving a high number of improved time, space and wavelength resolutions give rise to challenging computational problems. Standard physical inversion techniques cannot cope efficiently with this new scenario. Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in the last years to develop st…
Retrieval of Physical Parameters With Deep Structured Kernel Regression
PCA Gaussianization for image processing
The estimation of high-dimensional probability density functions (PDFs) is not an easy task for many image processing applications. The linear models assumed by widely used transforms are often quite restrictive to describe the PDF of natural images. In fact, additional non-linear processing is needed to overcome the limitations of the model. On the contrary, the class of techniques collectively known as projection pursuit, which solve the high-dimensional problem by sequential univariate solutions, may be applied to very general PDFs (e.g. iterative Gaussianization procedures). However, the associated computational cost has prevented their extensive use in image processing. In this work, w…
A Survey on Gaussian Processes for Earth-Observation Data Analysis: A Comprehensive Investigation
Gaussian processes (GPs) have experienced tremendous success in biogeophysical parameter retrieval in the last few years. GPs constitute a solid Bayesian framework to consistently formulate many function approximation problems. This article reviews the main theoretical GP developments in the field, considering new algorithms that respect signal and noise characteristics, extract knowledge via automatic relevance kernels to yield feature rankings automatically, and allow applicability of associated uncertainty intervals to transport GP models in space and time that can be used to uncover causal relations between variables and can encode physically meaningful prior knowledge via radiative tra…
Generation of global vegetation products from EUMETSAT AVHRR/METOP satellites
We describe the methodology applied for the retrieval of global LAI, FAPAR and FVC from Advanced Very High Resolution Radiometer (AVHRR) onboard the Meteorological-Operational (MetOp) polar orbiting satellites also known as EUMETSAT Polar System (EPS). A novel approach has been developed for the joint retrieval of three parameters (LAI, FVC, and FAPAR) instead of training one model per parameter. The method relies on multi-output Gaussian Processes Regression (GPR) trained over PROSAIL EPS simulations. A sensitivity analysis is performed to assess several sources of uncertainties in retrievals and maximize the positive impact of modeling the noise in training simulations. We describe the ma…
Estimating biophysical variable dependences with kernels
This paper introduces a nonlinear measure of dependence between random variables in the context of remote sensing data analysis. The Hilbert-Schmidt Independence Criterion (HSIC) is a kernel method for evaluating statistical dependence. HSIC is based on computing the Hilbert-Schmidt norm of the cross-covariance operator of mapped samples in the corresponding Hilbert spaces. The HSIC empirical estimator is very easy to compute and has good theoretical and practical properties. We exploit the capabilities of HSIC to explain nonlinear dependences in two remote sensing problems: temperature estimation and chlorophyll concentration prediction from spectra. Results show that, when the relationshi…
Optimizing Kernel Ridge Regression for Remote Sensing Problems
Kernel methods have been very successful in remote sensing problems because of their ability to deal with high dimensional non-linear data. However, they are computationally expensive to train when a large amount of samples are used. In this context, while the amount of available remote sensing data has constantly increased, the size of training sets in kernel methods is usually restricted to few thousand samples. In this work, we modified the kernel ridge regression (KRR) training procedure to deal with large scale datasets. In addition, the basis functions in the reproducing kernel Hilbert space are defined as parameters to be also optimized during the training process. This extends the n…
Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis
Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual in…
Unsupervised Anomaly and Change Detection With Multivariate Gaussianization
Anomaly detection (AD) is a field of intense research in remote sensing (RS) image processing. Identifying low probability events in RS images is a challenging problem given the high dimensionality of the data, especially when no (or little) information about the anomaly is available a priori. While a plenty of methods are available, the vast majority of them do not scale well to large datasets and require the choice of some (very often critical) hyperparameters. Therefore, unsupervised and computationally efficient detection methods become strictly necessary, especially now with the data deluge problem. In this article, we propose an unsupervised method for detecting anomalies and changes …
Visual aftereffects and sensory nonlinearities from a single statistical framework
When adapted to a particular scenery our senses may fool us: colors are misinterpreted, certain spatial patterns seem to fade out, and static objects appear to move in reverse. A mere empirical description of the mechanisms tuned to color, texture, and motion may tell us where these visual illusions come from. However, such empirical models of gain control do not explain why these mechanisms work in this apparently dysfunctional manner. Current normative explanations of aftereffects based on scene statistics derive gain changes by (1) invoking decorrelation and linear manifold matching/equalization, or (2) using nonlinear divisive normalization obtained from parametric scene models. These p…
Psychophysically Tuned Divisive Normalization Approximately Factorizes the PDF of Natural Images
The conventional approach in computational neuroscience in favor of the efficient coding hypothesis goes from image statistics to perception. It has been argued that the behavior of the early stages of biological visual processing (e.g., spatial frequency analyzers and their nonlinearities) may be obtained from image samples and the efficient coding hypothesis using no psychophysical or physiological information. In this work we address the same issue in the opposite direction: from perception to image statistics. We show that psychophysically fitted image representation in V1 has appealing statistical properties, for example, approximate PDF factorization and substantial mutual informatio…
Regression Wavelet Analysis for Lossless Coding of Remote-Sensing Data
A novel wavelet-based scheme to increase coefficient independence in hyperspectral images is introduced for lossless coding. The proposed regression wavelet analysis (RWA) uses multivariate regression to exploit the relationships among wavelet-transformed components. It builds on our previous nonlinear schemes that estimate each coefficient from neighbor coefficients. Specifically, RWA performs a pyramidal estimation in the wavelet domain, thus reducing the statistical relations in the residuals and the energy of the representation compared to existing wavelet-based schemes. We propose three regression models to address the issues concerning estimation accuracy, component scalability, and c…
Domain Adaptation of Landsat-8 and Proba-V Data Using Generative Adversarial Networks for Cloud Detection
Training machine learning algorithms for new satellites requires collecting new data. This is a critical drawback for most remote sensing applications and specially for cloud detection. A sensible strategy to mitigate this problem is to exploit available data from a similar sensor, which involves transforming this data to resemble the new sensor data. However, even taking into account the technical characteristics of both sensors to transform the images, statistical differences between data distributions still remain. This results in a poor performance of the methods trained on one sensor and applied to the new one. In this this work, we propose to use the generative adversarial networks (G…
Nonlinear data description with Principal Polynomial Analysis
Principal Component Analysis (PCA) has been widely used for manifold description and dimensionality reduction. Performance of PCA is however hampered when data exhibits nonlinear feature relations. In this work, we propose a new framework for manifold learning based on the use of a sequence of Principal Polynomials that capture the eventually nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) is shown to generalize PCA. Unlike recently proposed nonlinear methods (e.g. spectral/kernel methods and projection pursuit techniques, neural networks), PPA features are easily interpretable and the method leads to a fully invertible transform, which is a desirable property…
Consistent Regression of Biophysical Parameters with Kernel Methods
This paper introduces a novel statistical regression framework that allows the incorporation of consistency constraints. A linear and nonlinear (kernel-based) formulation are introduced, and both imply closed-form analytical solutions. The models exploit all the information from a set of drivers while being maximally independent of a set of auxiliary, protected variables. We successfully illustrate the performance in the estimation of chlorophyll content.
Statistical atmospheric parameter retrieval largely benefits from spatial-spectral image compression
The infrared atmospheric sounding interferometer (IASI) is flying on board of the Metop satellite series, which is part of the EUMETSAT Polar System. Products obtained from IASI data represent a significant improvement in the accuracy and quality of the measurements used for meteorological models. Notably, the IASI collects rich spectral information to derive temperature and moisture profiles, among other relevant trace gases, essential for atmospheric forecasts and for the understanding of weather. Here, we investigate the impact of near-lossless and lossy compression on IASI L1C data when statistical retrieval algorithms are later applied. We search for those compression ratios that yield…
Kernel-based retrieval of atmospheric profiles from IASI data
This paper proposes the use of kernel ridge regression (KRR) to derive surface and atmospheric properties from hyperspectral infrared sounding spectra. We focus on the retrieval of temperature and humidity atmospheric profiles from Infrared Atmospheric Sounding Interferometer (MetOp-IASI) data, and provide confidence maps on the predictions. In addition, we propose a scheme for the identification of anomalies by supervised classification of discrepancies with the ECMWF estimates. For the retrieval, we observed that KRR clearly outperformed linear regression. Looking at the confidence maps, we observed that big discrepancies are mainly due to the presence of clouds and low emissivities in de…
Information Theory in Density Destructors
Density destructors are differentiable and invertible transforms that map multivariate PDFs of arbitrary structure (low entropy) into non-structured PDFs (maximum entropy). Multivariate Gaussianization and multivariate equalization are specific examples of this family, which break down the complexity of the original PDF through a set of elementary transforms that progressively remove the structure of the data. We demonstrate how this property of density destructive flows is connected to classical information theory, and how density destructors can be used to get more accurate estimates of information theoretic quantities. Experiments with total correlation and mutual information inmultivari…
Derivation of global vegetation biophysical parameters from EUMETSAT Polar System
Abstract This paper presents the algorithm developed in LSA-SAF (Satellite Application Facility for Land Surface Analysis) for the derivation of global vegetation parameters from the AVHRR (Advanced Very High Resolution Radiometer) sensor on board MetOp (Meteorological–Operational) satellites forming the EUMETSAT (European Organization for the Exploitation of Meteorological Satellites) Polar System (EPS). The suite of LSA-SAF EPS vegetation products includes the leaf area index (LAI), the fractional vegetation cover (FVC), and the fraction of absorbed photosynthetically active radiation (FAPAR). LAI, FAPAR, and FVC characterize the structure and the functioning of vegetation and are key par…
Perceptual image quality assessment using a normalized Laplacian pyramid
Combined dynamics of the 500–600 nm leaf absorption and chlorophyll fluorescence changes in vivo: Evidence for the multifunctional energy quenching role of xanthophylls
Carotenoids (Cars) regulate the energy flow towards the reaction centres in a versatile way whereby the switch between energy harvesting and dissipation is strongly modulated by the operation of the xanthophyll cycles. However, the cascade of molecular mechanisms during the change from light harvesting to energy dissipation remains spectrally poorly understood. By characterizing the in vivo absorbance changes (Delta A) of leaves from four species in the 500-600 nm range through a Gaussian decomposition, while measuring passively simultaneous Chla fluorescence (F) changes, we present a direct observation of the quick antenna adjustments during a 3-min dark-to-high-light induction. Underlying…
Computing variations of entropy and redundancy under nonlinear mappings not preserving the signal dimension: quantifying the efficiency of V1 cortex
In computational neuroscience, the Efficient Coding Hypothesis argues that the neural organization comes from the optimization of information-theoretic goals [Barlow Proc.Nat.Phys.Lab.59]. A way to confirm this requires the analysis of the statistical performance of biological systems that have not been statistically optimized [Renart et al. Science10, Malo&Laparra Neur.Comp.10, Foster JOSA18, Gomez-Villa&Malo J.Neurophysiol.19]. However, when analyzing the information-theoretic performance, cortical magnification in the retina-cortex pathway poses a theoretical problem. Cortical magnification stands for the increase the signal dimensionality [Cowey&Rolls Exp. Brain Res.74]. Conventional mo…
Predicting perceptual distortion sensitivity with gain control models of LGN
A Review of Kernel Methods in Remote Sensing Data Analysis
Kernel methods have proven effective in the analysis of images of the Earth acquired by airborne and satellite sensors. Kernel methods provide a consistent and well-founded theoretical framework for developing nonlinear techniques and have useful properties when dealing with low number of (potentially high dimensional) training samples, the presence of heterogenous multimodalities, and different noise sources in the data. These properties are particularly appropriate for remote sensing data analysis. In fact, kernel methods have improved results of parametric linear methods and neural networks in applications such as natural resource control, detection and monitoring of anthropic infrastruc…
Kernel methods and their derivatives: Concept and perspectives for the earth system sciences.
Kernel methods are powerful machine learning techniques which implement generic non-linear functions to solve complex tasks in a simple way. They Have a solid mathematical background and exhibit excellent performance in practice. However, kernel machines are still considered black-box models as the feature mapping is not directly accessible and difficult to interpret.The aim of this work is to show that it is indeed possible to interpret the functions learned by various kernel methods is intuitive despite their complexity. Specifically, we show that derivatives of these functions have a simple mathematical formulation, are easy to compute, and can be applied to many different problems. We n…
Fair Kernel Learning
New social and economic activities massively exploit big data and machine learning algorithms to do inference on people’s lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient.
Eigen-Distortions of Hierarchical Representations
We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most- and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human p…
Dimensionality reduction via regression on hyperspectral infrared sounding data
This paper introduces a new method for dimensionality reduction via regression (DRR). The method generalizes Principal Component Analysis (PCA) in such a way that reduces the variance of the PCA scores. In order to do so, DRR relies on a deflationary process in which a non-linear regression reduces the redundancy between the PC scores. Unlike other nonlinear dimensionality reduction methods, DRR is easy to apply, it has out-of-sample extension, it is invertible, and the learned transformation is volume-preserving. These properties make the method useful for a wide range of applications, especially in very high dimensional data in general, and for hyperspectral image processing in particular…
Cross-Sensor Adversarial Domain Adaptation of Landsat-8 and Proba-V images for Cloud Detection
The number of Earth observation satellites carrying optical sensors with similar characteristics is constantly growing. Despite their similarities and the potential synergies among them, derived satellite products are often developed for each sensor independently. Differences in retrieved radiances lead to significant drops in accuracy, which hampers knowledge and information sharing across sensors. This is particularly harmful for machine learning algorithms, since gathering new ground truth data to train models for each sensor is costly and requires experienced manpower. In this work, we propose a domain adaptation transformation to reduce the statistical differences between images of two…
Principal polynomial analysis for remote sensing data processing
Inspired by the concept of Principal Curves, in this paper, we define Principal Polynomials as a non-linear generalization of Principal Components to overcome the conditional mean independence restriction of PCA. Principal Polynomials deform the straight Principal Components by minimizing the regression error (or variance) in the corresponding orthogonal subspaces. We propose to use a projection on a series of these polynomials to set a new nonlinear data representation: the Principal Polynomial Analysis (PPA). We prove that the dimensionality reduction error in PPA is always lower than in PCA. Lower truncation error and increased independence suggest that unsupervised PPA features can be b…
Enforcing Perceptual Consistency on Generative Adversarial Networks by Using the Normalised Laplacian Pyramid Distance
In recent years there has been a growing interest in image generation through deep learning. While an important part of the evaluation of the generated images usually involves visual inspection, the inclusion of human perception as a factor in the training process is often overlooked. In this paper we propose an alternative perceptual regulariser for image-to-image translation using conditional generative adversarial networks (cGANs). To do so automatically (avoiding visual inspection), we use the Normalised Laplacian Pyramid Distance (NLPD) to measure the perceptual similarity between the generated image and the original image. The NLPD is based on the principle of normalising the value of…
Lossless coding of hyperspectral images with principal polynomial analysis
The transform in image coding aims to remove redundancy among data coefficients so that they can be independently coded, and to capture most of the image information in few coefficients. While the second goal ensures that discarding coefficients will not lead to large errors, the first goal ensures that simple (point-wise) coding schemes can be applied to the retained coefficients with optimal results. Principal Component Analysis (PCA) provides the best independence and data compaction for Gaussian sources. Yet, non-linear generalizations of PCA may provide better performance for more realistic non-Gaussian sources. Principal Polynomial Analysis (PPA) generalizes PCA by removing the non-li…
End-to-end Optimized Image Compression
We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer.…
Divisive normalization image quality metric revisited.
Structural similarity metrics and information-theory-based metrics have been proposed as completely different alternatives to the traditional metrics based on error visibility and human vision models. Three basic criticisms were raised against the traditional error visibility approach: (1) it is based on near-threshold performance, (2) its geometric meaning may be limited, and (3) stationary pooling strategies may not be statistically justified. These criticisms and the good performance of structural and information-theory-based metrics have popularized the idea of their superiority over the error visibility approach. In this work we experimentally or analytically show that the above critic…
Perceptually Optimized Image Rendering
We develop a framework for rendering photographic images by directly optimizing their perceptual similarity to the original visual scene. Specifically, over the set of all images that can be rendered on a given display, we minimize the normalized Laplacian pyramid distance (NLPD), a measure of perceptual dissimilarity that is derived from a simple model of the early stages of the human visual system. When rendering images acquired with a higher dynamic range than that of the display, we find that the optimization boosts the contrast of low-contrast features without introducing significant artifacts, yielding results of comparable visual quality to current state-of-the-art methods, but witho…
Physics-aware Gaussian processes in remote sensing
Abstract Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics, and generally yields efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations between the state vector and the radiance observations is available though and could be useful to improve pre…
Information Theory Measures via Multidimensional Gaussianization
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. Here we propose an indirect way of computing information based on a multivariate Gaussianization transform. Our proposal mitigates the difficulty of multivariate density estimation by reducing it to a composi…
Spatio-Chromatic Adaptation via Higher-Order Canonical Correlation Analysis of Natural Images
Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlat…
Accounting for Input Noise in Gaussian Process Parameter Retrieval
Gaussian processes (GPs) are a class of Kernel methods that have shown to be very useful in geoscience and remote sensing applications for parameter retrieval, model inversion, and emulation. They are widely used because they are simple, flexible, and provide accurate estimates. GPs are based on a Bayesian statistical framework which provides a posterior probability function for each estimation. Therefore, besides the usual prediction (given in this case by the mean function), GPs come equipped with the possibility to obtain a predictive variance (i.e., error bars, confidence intervals) for each prediction. Unfortunately, the GP formulation usually assumes that there is no noise in the inpu…
Complex-Valued Independent Component Analysis of Natural Images
Linear independent component analysis (ICA) learns simple cell receptive fields fromnatural images. Here,we showthat linear complex-valued ICA learns complex cell properties from Fourier-transformed natural images, i.e. two Gabor-like filters with quadrature-phase relationship. Conventional methods for complex-valued ICA assume that the phases of the output signals have uniform distribution. We show here that for natural images the phase distributions are, however, often far from uniform. We thus relax the uniformity assumption and model also the phase of the sources in complex-valued ICA. Compared to the original complex ICA model, the new model provides a better fit to the data, and leads…
Nonlinearities and Adaptation of Color Vision from Sequential Principal Curves Analysis
Mechanisms of human color vision are characterized by two phenomenological aspects: the system is nonlinear and adaptive to changing environments. Conventional attempts to derive these features from statistics use separate arguments for each aspect. The few statistical explanations that do consider both phenomena simultaneously follow parametric formulations based on empirical models. Therefore, it may be argued that the behavior does not come directly from the color statistics but from the convenient functional form adopted. In addition, many times the whole statistical analysis is based on simplified databases that disregard relevant physical effects in the input signal, as, for instance…
Large-scale random features for kernel regression
Kernel methods constitute a family of powerful machine learning algorithms, which have found wide use in remote sensing and geosciences. However, kernel methods are still not widely adopted because of the high computational cost when dealing with large scale problems, such as the inversion of radiative transfer models. This paper introduces the method of random kitchen sinks (RKS) for fast statistical retrieval of bio-geo-physical parameters. The RKS method allows to approximate a kernel matrix with a set of random bases sampled from the Fourier domain. We extend their use to other bases, such as wavelets, stumps, and Walsh expansions. We show that kernel regression is now possible for data…
Efficient Kernel Cook's Distance for Remote Sensing Anomalous Change Detection
Detecting anomalous changes in remote sensing images is a challenging problem, where many approaches and techniques have been presented so far. We rely on the standard field of multivariate statistics of diagnostic measures, which are concerned about the characterization of distributions, detection of anomalies, extreme events, and changes. One useful tool to detect multivariate anomalies is the celebrated Cook's distance. Instead of assuming a linear relationship, we present a novel kernelized version of the Cook's distance to address anomalous change detection in remote sensing images. Due to the large computational burden involved in the direct kernelization, and the lack of out-…
Transfer Learning with Convolutional Networks for Atmospheric Parameter Retrieval
The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series provides important measurements for Numerical Weather Prediction (NWP). Retrieving accurate atmospheric parameters from the raw data provided by IASI is a large challenge, but necessary in order to use the data in NWP models. Statistical models performance is compromised because of the extremely high spectral dimensionality and the high number of variables to be predicted simultaneously across the atmospheric column. All this poses a challenge for selecting and studying optimal models and processing schemes. Earlier work has shown non-linear models such as kernel methods and neural networks perform w…
Statistical retrieval of atmospheric profiles with deep convolutional neural networks
Abstract Infrared atmospheric sounders, such as IASI, provide an unprecedented source of information for atmosphere monitoring and weather forecasting. Sensors provide rich spectral information that allows retrieval of temperature and moisture profiles. From a statistical point of view, the challenge is immense: on the one hand, “underdetermination” is common place as regression needs to work on high dimensional input and output spaces; on the other hand, redundancy is present in all dimensions (spatial, spectral and temporal). On top of this, several noise sources are encountered in the data. In this paper, we present for the first time the use of convolutional neural networks for the retr…
Nonlinear statistical retrieval of surface emissivity from IASI data
Emissivity is one of the most important parameters to improve the determination of the troposphere properties (thermodynamic properties, aerosols and trace gases concentration) and it is essential to estimate the radiative budget. With the second generation of infrared sounders, we can estimate emissivity spectra at high spectral resolution, which gives us a global view and long-term monitoring of continental surfaces. Statistically, this is an ill-posed retrieval problem, with as many output variables as inputs. We here propose nonlinear multi-output statistical regression based on kernel methods to estimate spectral emissivity given the radiances. Kernel methods can cope with high-dimensi…
Encoding Invariances in Remote Sensing Image Classification With SVM
This letter introduces a simple method for including invariances in support-vector-machine (SVM) remote sensing image classification. We design explicit invariant SVMs to deal with the particular characteristics of remote sensing images. The problem of including data invariances can be viewed as a problem of encoding prior knowledge, which translates into incorporating informative support vectors (SVs) that better describe the classification problem. The proposed method essentially generates new (synthetic) SVs from the obtained by training a standard SVM with the available labeled samples. Then, original and transformed SVs are used for training the virtual SVM introduced in this letter. W…
Fair Kernel Learning
New social and economic activities massively exploit big data and machine learning algorithms to do inference on people's lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient. We present novel fai…
Transferring deep learning models for cloud detection between Landsat-8 and Proba-V
Abstract Accurate cloud detection algorithms are mandatory to analyze the large streams of data coming from the different optical Earth observation satellites. Deep learning (DL) based cloud detection schemes provide very accurate cloud detection models. However, training these models for a given sensor requires large datasets of manually labeled samples, which are very costly or even impossible to create when the satellite has not been launched yet. In this work, we present an approach that exploits manually labeled datasets from one satellite to train deep learning models for cloud detection that can be applied (or transferred) to other satellites. We take into account the physical proper…
Including invariances in SVM remote sensing image classification
This paper introduces a simple method to include invariances in support vector machine (SVM) for remote sensing image classification. We rely on the concept of virtual support vectors, by which the SVM is trained with both the selected support vectors and synthetic examples encoding the invariance of interest. The algorithm is very simple and effective, as demonstrated in two particularly interesting examples: invariance to the presence of shadows and to rotations in patchbased image segmentation. The improved accuracy (around +6% both in OA and Cohen's κ statistic), along with the simplicity of the approach encourage its use and extension to encode other invariances and other remote sensin…
Disentangling the Link Between Image Statistics and Human Perception
In the 1950s Horace Barlow and Fred Attneave suggested a connection between sensory systems and how they are adapted to the environment: early vision evolved to maximise the information it conveys about incoming signals. Following Shannon's definition, this information was described using the probability of the images taken from natural scenes. Previously, direct accurate predictions of image probabilities were not possible due to computational limitations. Despite the exploration of this idea being indirect, mainly based on oversimplified models of the image density or on system design methods, these methods had success in reproducing a wide range of physiological and psychophysical phenom…
Probabilistic cross-validation estimators for Gaussian process regression
Gaussian Processes (GPs) are state-of-the-art tools for regression. Inference of GP hyperparameters is typically done by maximizing the marginal log-likelihood (ML). If the data truly follows the GP model, using the ML approach is optimal and computationally efficient. Unfortunately very often this is not case and suboptimal results are obtained in terms of prediction error. Alternative procedures such as cross-validation (CV) schemes are often employed instead, but they usually incur in high computational costs. We propose a probabilistic version of CV (PCV) based on two different model pieces in order to reduce the dependence on a specific model choice. PCV presents the benefits from both…
Dimensionality Reduction via Regression in Hyperspectral Imagery
This paper introduces a new unsupervised method for dimensionality reduction via regression (DRR). The algorithm belongs to the family of invertible transforms that generalize Principal Component Analysis (PCA) by using curvilinear instead of linear features. DRR identifies the nonlinear features through multivariate regression to ensure the reduction in redundancy between he PCA coefficients, the reduction of the variance of the scores, and the reduction in the reconstruction error. More importantly, unlike other nonlinear dimensionality reduction methods, the invertibility, volume-preservation, and straightforward out-of-sample extension, makes DRR interpretable and easy to apply. The pro…
HyperLabelMe : A Web Platform for Benchmarking Remote-Sensing Image Classifiers
HyperLabelMe is a web platform that allows the automatic benchmarking of remote-sensing image classifiers. To demonstrate this platform's attributes, we collected and harmonized a large data set of labeled multispectral and hyperspectral images with different numbers of classes, dimensionality, noise sources, and levels. The registered user can download training data pairs (spectra and land cover/use labels) and submit the predictions for unseen testing spectra. The system then evaluates the accuracy and robustness of the classifier, and it reports different scores as well as a ranked list of the best methods and users. The system is modular, scalable, and ever-growing in data sets and clas…
Learning Structures in Earth Observation Data with Gaussian Processes
Gaussian Processes (GPs) has experienced tremendous success in geoscience in general and for bio-geophysical parameter retrieval in the last years. GPs constitute a solid Bayesian framework to formulate many function approximation problems consistently. This paper reviews the main theoretical GP developments in the field. We review new algorithms that respect the signal and noise characteristics, that provide feature rankings automatically, and that allow applicability of associated uncertainty intervals to transport GP models in space and time. All these developments are illustrated in the field of geoscience and remote sensing at a local and global scales through a set of illustrative exa…
IASI dataset v1
The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series measures the infrared spectrum with high resolution. The ground footprint resolution of the instruments is 12 km at nadir, and a spectral resolution of 0.25cm −1 in the spectrum between 645 cm −1 and 2760 cm −1 . This results in 8461 spectral samples covering 2200km scan-swath with 60 points per line. IASI is an ideal instrument for monitoring different physical/chemical parameters in the atmosphere e.g. temperature, humidity and trace gases such as ozone. Energy from different altitudes returns a different spectral shift. In this way atmospheric profiles can be obtained and these provides important …