6533b7d6fe1ef96bd1265bcc

RESEARCH PRODUCT

Nonlinear Distribution Regression for Remote Sensing Applications

Jordi Muñoz-maríJose E. AdsuaraGustau Camps-vallsMaria PilesAnna Mateo-sanchisAdrian Perez-suay

subject

Signal Processing (eess.SP)FOS: Computer and information sciencesComputer Science - Machine LearningArtificial neural networkRemote sensing applicationComputer science0211 other engineering and technologies02 engineering and technologyLeast squaresRandom forestMachine Learning (cs.LG)Kernel (linear algebra)symbols.namesakeKernel (statistics)symbolsFOS: Electrical engineering electronic engineering information engineeringGeneral Earth and Planetary SciencesElectrical Engineering and Systems Science - Signal ProcessingElectrical and Electronic EngineeringGaussian processAlgorithm021101 geological & geomatics engineeringCurse of dimensionality

description

In many remote sensing applications, one wants to estimate variables or parameters of interest from observations. When the target variable is available at a resolution that matches the remote sensing observations, standard algorithms, such as neural networks, random forests, or the Gaussian processes, are readily available to relate the two. However, we often encounter situations where the target variable is only available at the group level, i.e., collectively associated with a number of remotely sensed observations. This problem setting is known in statistics and machine learning as multiple instance learning (MIL) or distribution regression (DR). This article introduces a nonlinear (kernel-based) method for DR that solves the previous problems without making any assumption on the statistics of the grouped data. The presented formulation considers distribution embeddings in reproducing kernel Hilbert spaces and performs standard least squares regression with the empirical means therein. A flexible version to deal with multisource data of different dimensionality and sample sizes is also presented and evaluated. It allows working with the native spatial resolution of each sensor, avoiding the need for matchup procedures. Noting the large computational cost of the approach, we introduce an efficient version via random Fourier features to cope with millions of points and groups. Real experiments involve the Soil Moisture Active Passive (SMAP) vegetation optical depth (VOD) data for the estimation of crop production in the U.S. Corn Belt and the Moderate Resolution Imaging Spectroradiometer (MODIS) and Multi-angle Imaging SpectroRadiometer (MISR) reflectances for the estimation of aerosol optical depth (AOD). An exhaustive empirical evaluation of the method is done against naive (linear and nonlinear) approaches based on input-space means as well as previously presented methods for MIL. We provide source code of our methods in http://isp.uv.es/code/dr.html .

10.1109/tgrs.2019.2931085http://dx.doi.org/10.1109/tgrs.2019.2931085