6533b854fe1ef96bd12ae9e7

RESEARCH PRODUCT

Intelligent Sampling for Vegetation Nitrogen Mapping Based on Hybrid Machine Learning Algorithms

Katja BergerJuan Pablo Rivera-caicedoJochem Verrelst

subject

Training setMean squared errorActive learning (machine learning)Data stream miningComputer scienceFrame (networking)0211 other engineering and technologiesSampling (statistics)02 engineering and technologyVegetation15. Life on landGeotechnical Engineering and Engineering Geologycomputer.software_genreArticleEuclidean distancesymbols.namesakesymbolsData miningElectrical and Electronic EngineeringGaussian processcomputer021101 geological & geomatics engineering

description

Upcoming satellite imaging spectroscopy missions will deliver spatiotemporal explicit data streams to be exploited for mapping vegetation properties, such as nitrogen (N) content. Within retrieval workflows for real-time mapping over agricultural regions, such crop-specific information products need to be derived precisely and rapidly. To allow fast processing, intelligent sampling schemes for training databases should be incorporated to establish efficient machine learning (ML) models. In this study, we implemented active learning (AL) heuristics using kernel ridge regression (KRR) to minimize and optimize a training database for variational heteroscedastic Gaussian processes regression (VHGPR) to estimate aboveground N content. Several uncertainty and diversity criteria were applied on a lookup table (LUT) composed of aboveground N content and corresponding hyperspectral reflectance simulated by the PROSAIL-PRO model. The best-performing AL criteria were Euclidian distance-based diversity (EBD) resulting in a reduction of the LUT training data set by 81% (50 initial samples plus 141 samples selected from a pool of 1000 samples). This reduced LUT was used for training VHGPR, which is not only a competitive algorithm but also provides uncertainty estimates. Validation against in situ N reference data provided excellent results with a root-mean-square error (RMSE) of 1.84 g/m(2) and a coefficient of determination (R(2)) of 0.92. Mapping aboveground N content over an agricultural region yielded reliable estimates and meaningful associated uncertainties. These promising results encourage the transfer of such hybrid workflows into space and time within the frame of future operational N monitoring from satellite imaging spectroscopy data.

10.1109/lgrs.2020.3014676https://europepmc.org/articles/PMC7613344/