0000000000026253
AUTHOR
Hervé Cardot
Variance estimation and asymptotic confidence bands for the mean estimator of sampled functional data with high entropy unequal probability sampling designs
For fixed size sampling designs with high entropy it is well known that the variance of the Horvitz-Thompson estimator can be approximated by the H\'ajek formula. The interest of this asymptotic variance approximation is that it only involves the first order inclusion probabilities of the statistical units. We extend this variance formula when the variable under study is functional and we prove, under general conditions on the regularity of the individual trajectories and the sampling design, that we can get a uniformly convergent estimator of the variance function of the Horvitz-Thompson estimator of the mean function. Rates of convergence to the true variance function are given for the re…
Functional Linear Regression
This article presents a selected bibliography on functional linear regression (FLR) and highlights the key contributions from both applied and theoretical points of view. It first defines FLR in the case of a scalar response and shows how its modelization can also be extended to the case of a functional response. It then considers two kinds of estimation procedures for this slope parameter: projection-based estimators in which regularization is performed through dimension reduction, such as functional principal component regression, and penalized least squares estimators that take into account a penalized least squares minimization problem. The article proceeds by discussing the main asympt…
Partially time invariant panel data regression
When dealing with panel data, considering the variation over time of the variable of interest allows to get rid of potential individual effects. Even though the outcome variable has a continuous distribution, its variation over time can be equal to zero with a strictly positive probability and thus its distribution is a mixture of a mass at zero and a continuous distribution. We introduce a parametric statistical model based on conditional mixtures, build estimators for the parameters related to the conditional probability of no variation and to the conditional expectation related to the continuous part of the distribution and derive their asymptotic consistency and normality under a specif…
Confidence bands for Horvitz-Thompson estimators using sampled noisy functional data
When collections of functional data are too large to be exhaustively observed, survey sampling techniques provide an effective way to estimate global quantities such as the population mean function. Assuming functional data are collected from a finite population according to a probabilistic sampling scheme, with the measurements being discrete in time and noisy, we propose to first smooth the sampled trajectories with local polynomials and then estimate the mean function with a Horvitz-Thompson estimator. Under mild conditions on the population size, observation times, regularity of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central Limit theorem in the space of …
Assessing Spillover Effects of Spatial Policies with Semiparametric Zero-Inflated Models and Random Forests
The aim of this work is to estimate the variation over time of the spatial spillover effects of a public policy that was devoted to boost rural development in France over the period 1993–2002. At a micro data level, it is often observed that the dependent variable, such as local employment in a municipality, does not vary along time, so that we face a kind of zero inflated phenomenon that cannot be dealt with a classical continuous response model or propensity score approaches. We consider two recent non parametric techniques that are able to deal with that estimation issue. The first approach consists in fitting two generalized additive models to estimate both the probability of no variati…
Modeling TDS data and segmenting consumers thanks to a mixture of semi-Markov processes
International audience
Méthodes multivariées combinant ondelettes et analyse en composantes principales pour le débruitage de données issues de spectrométrie de masse
International audience; L'identification de nouveaux biomarqueurs diagnostiques ou pronostiques est un des objectifs majeurs en recherche clinique. L'utilisation des technologies à haut débit comme la spectrométrie de masse est prometteuse pour l'identification de tels marqueurs. A partir d'un prélèvement de sang ou de tumeur par exemple, cette technologie permet de traduire sous forme de spectres le profil protéique des individus. Le signal biologique observé dans les spectres est masqué par différentes sources de variabilités techniques, qu'une phase préalable de prétraitement doit permettre de retirer. La méthode classique permettant de retirer le bruit aléatoire de mesure de ce signal c…
Conditional Bias Robust Estimation of the Total of Curve Data by Sampling in a Finite Population: An Illustration on Electricity Load Curves
Abstract For marketing or power grid management purposes, many studies based on the analysis of total electricity consumption curves of groups of customers are now carried out by electricity companies. Aggregated totals or mean load curves are estimated using individual curves measured at fine time grid and collected according to some sampling design. Due to the skewness of the distribution of electricity consumptions, these samples often contain outlying curves which may have an important impact on the usual estimation procedures. We introduce several robust estimators of the total consumption curve which are not sensitive to such outlying curves. These estimators are based on the conditio…
Improving spatial temperature estimates by resort to time autoregressive processes
Temperature estimation methods usually involve regression followed by kriging of residuals (residual kriging). Despite the performance of such models, there is invariably a residual which is not necessarily unpredictable because it may still be correlated in time. We set out to analyse such residuals through resort to autoregressive processes. It is shown that the optimal period varies depending on whether it is identified by functions of the form resd = f(resd−1, resd−2, ..., resd−p) or by partial correlations. Autoregressive processes significantly improve estimates, which are evaluated by cross-validations. Finally, the two following points are discussed: (1) the assumptions of the autor…
Monitoring elevation variations in leaf phenology of deciduous broadleaf forests from SPOT/VEGETATION time-series
International audience; In mountain forest ecosystems where elevation gradients are prominent, temperature gradient-based phonological variability can be high. However, there are few studies that assess the capability of remote sensing observations to monitor ecosystem phenology along elevation gradients, despite their relevance under climate change. We investigated the potential of medium resolution remotely sensed data to monitor the elevation variations in the seasonal dynamics of a temperate deciduous broadleaf forested ecosystem. Further, we explored the impact of elevation on the onset of spring leafing. This study was based on the analysis of multi-annual time-series of VEGETATION da…
Types of climates on continental France, a spatial construction
Le climat est un élément important de la vie des territoires car il conditionne le comportement et les décisions des individus et des groupes sociaux comme celui de l’ensemble des espèces vivantes et des écosystèmes. A ce titre, la différenciation de l’espace selon les climats et les aptitudes qui en résultent est un domaine qui mérite d’être réinvesti par la recherche en mettant à profit des moyens de traitement modernes de l’information. Avec cet objectif en vue, les auteurs proposent une approche spatiale de définition des climats. Partant des mesures stationnelles de précipitation et de température mises à disposition par Météo-France, un jeu de 14 variables intégrant une série temporel…
La consommation d'énergie des ménages en France
Le rapport analyse les dépenses d'énergie domestique des ménages à partir des enquêtes Logement de l'INSEE : étude en économétrie de données de panel (chapitre 1), vulnérabilité énergétique (explication du froid ressenti dans le logement, chapitre 2), dépense énergétique, consommation d'énergie et émissions de CO2 (chapitre 3), émissions territorialisées de CO2 par les logements et les navettes (domicile - travail ou école) (chapitre 4). Le rôle du climat hivernal est pris en compte dans chaque chapitre.
Multivariate denoising methods combining wavelets and principal component analysis for mass spectrometry data
The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. In recent years, there has been a growing interest in using mass spectrometry for the detection of such biomarkers. The MS signal resulting from MALDI-TOF measurements is contaminated by different sources of technical variations that can be removed by a prior pre-processing step. In particular, denoising makes it possible to remove the random noise contained in the signal. Wavelet methodology associated with thresholding is usually used for this purpose. In this study, we adapted two multivariate denoising methods that combine wavelets and PCA to MS data. The objective was to o…
In reply to Alber and Söhn.
The price of climate: French consumer preferences reveal spatial and individual inequalities.
National audience; We use the hedonic price method to study consumer preferences for climate temperature, very hot or cold days, and rainfall) in France, a temperate countrywith varied climates. Data are for (i) individual attributes and prices of houses and workers and (ii) climate attributes interpolated from weather stations. We show that French households value warmer temperatures while very hot days are a nuisance. Such climatic amenities are attributes of consumers’ utility function; nevertheless, global warming assessments by economists, such as the Stern Review Report (2006), ignore these climatic preferences. The social welfare assessment is changed when the direct consumption of c…
A fast and recursive algorithm for clustering large datasets with k-medians
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…
Modeling temporal treatment effects with zero inflated semi-parametric regression models: The case of local development policies in France
International audience; A semi-parametric approach is proposed to estimate the variation along time of the effects of two distinct public policies that were devoted to boost rural development in France over a similar period of time. At a micro data level, it is often observed that the dependent variable, such as local employment, does not vary along time, so that we face a kind of zero inflated phenomenon that cannot be dealt with a continuous response model. We introduce a conditional mixture model which combines a mass at zero and a continuous response. The suggested zero inflated semi-parametric statistical approach relies on the flexibility and modularity of additive models with the abi…
Recursive estimation of the conditional geometric median in Hilbert spaces
International audience; A recursive estimator of the conditional geometric median in Hilbert spaces is studied. It is based on a stochastic gradient algorithm whose aim is to minimize a weighted L1 criterion and is consequently well adapted for robust online estimation. The weights are controlled by a kernel function and an associated bandwidth. Almost sure convergence and L2 rates of convergence are proved under general conditions on the conditional distribution as well as the sequence of descent steps of the algorithm and the sequence of bandwidths. Asymptotic normality is also proved for the averaged version of the algorithm with an optimal rate of convergence. A simulation study confirm…
Temperature interpolation by local information ; the example of France
International audience; Methods of interpolation, whether based on regressions or on kriging, are global methods in which all the available data for a given study area are used. But the quality of results is affected when the study area is spatially very heterogeneous. To overcome this difficulty, a method of local interpolation is proposed and tested here with temperature in France. Starting from a set of weather stations spread across the country and digitized as 250 m-sided cells, the method consists in modelling local spatial variations in temperature by considering each point of the grid and the n weather stations that are its nearest neighbours. The procedure entails a series of steps…
Hypothesis testing for Panels of Semi-Markov Processes with parametric sojourn time distributions
This work deals with the asymptotic properties of maximum likelihood estimators for semi-Markov processes with parametric sojourn time distributions. It is motivated by the comparison, via a two-sample test procedure, of the distribution of two panels of qualitative trajectories modeled by semi-Markov processes and observed over a random number of transitions. Considering first one panel of growing size, we derive, under classical conditions, the convergence in probability of the estimators of the transition probabilities and the parameters of the sojourn time distributions as well as their asymptotic normality. We then consider panels of semi-Markov processes drawn from two different popul…
Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling
When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose here a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove under mild regularity conditions that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional Central Limit Theorem and deduce asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal …
Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
International audience; The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicat…
A multiple-response chi-square framework for the analysis of Free-Comment and Check-All-That-Apply data
International audience; Free-Comment (FC) and Check-All-That-Apply (CATA) provide a contingency table containing citation counts of descriptors by products. The analyses performed on this table are most often related to the chi-square statistic. However, such practices are not well suited because they consider experimental units as being the citations (one descriptor for one product by one subject) while the evaluations (vector of citations for one product by one subject) should be considered instead. This results in incorrect expected frequencies under the null hypothesis of independence between products and descriptors and thus in an incorrect chi-square statistic. Thus, analyses related …
Semiparametric models with functional responses in a survey sampling setting : model assisted estimation of electricity consumption curve
International audience; Ce travail adopte une approche de type sondage quand le but est d'estimer une courbe moyenne d'une grande base de données de données fonctionnelles. Lorsque les capacités de stockage sont limitées, grâce aux techniques de sondage, une petite partie des observations est une alternative intéressante par rapport aux techniques de compression. Nous proposons ici de prendre en considération une information auxiliaire réelle ou multivariée obtenu à moindre coût sur la population toute entière, avec une approche semiparamétrique de type modèle assisté, dans le but d'améliorer les estimateurs d'Horvitz-Thompson de la courbe moyenne. D'abord, nous estimerons les composantes p…
Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…
Variance Estimation and Asymptotic Confidence Bands for the Mean Estimator of Sampled Functional Data with High Entropy Unequal Probability Sampling Designs
For fixed size sampling designs with high entropy it is well known that the variance of the Horvitz-Thompson estimator can be approximated by the Hajek formula. The interest of this asymptotic variance approximation is that it only involves the first order inclusion probabilities of the statistical units. We extend this variance formula when the variable under study is functional and we prove, under general conditions on the regularity of the individual trajectories and the sampling design, that it asymptotically provides a uniformly consistent estimator of the variance function of the Horvitz-Thompson estimator of the mean function. Rates of convergence to the true variance function are gi…
Uniform convergence and asymptotic confidence bands for model-assisted estimators of the mean of sampled functional data
When the study variable is functional and storage capacities are limited or transmission costs are high, selecting with survey sampling techniques a small fraction of the observations is an interesting alternative to signal compression techniques, particularly when the goal is the estimation of simple quantities such as means or totals. We extend, in this functional framework, model-assisted estimators with linear regression models that can take account of auxiliary variables whose totals over the population are known. We first show, under weak hypotheses on the sampling design and the regularity of the trajectories, that the estimator of the mean function as well as its variance estimator …
Le prix hédoniste du climat en France
National audience; Nous étudions le prix hédoniste d'attributs climatiques à travers leur capitalisation dans le loyer ou le prix d'achat d'un logement, ainsi que dans le salaire. Une base de données a été constituée pour l'ensemble de la France par interpolation des données climatiques de Météo France. Ces variables sont intégrées dans des modèles de prix hédoniste estimés à partir des enquêtes Logement de l'INSEE (1988 à 2002). Les résultats donnent les prix hédonistes d'attributs intrinsèques du logement, de la localisation dans le système urbain et des variables climatiques. Pour ces dernières, des prix très significatifs sont obtenus, en particulier pour la température en été (positif …
Estimation of total electricity consumption curves of small areas by sampling in a finite population
International audience; Many studies carried out in the French electricity company EDF are based on the analysis of the total electricity consumption curves of groups of customers. These aggregated electricity consumption curves are estimated by using samples of thousands of curves measured at a small time step and collected according to a sampling design. Small area estimation is very usual in survey sampling. It is often addressed by using implicit or explicit domain models between the interest variable and the auxiliary variables. The goal here is to estimate totals of electricity consumption curves over domains or areas. Three approaches are compared: the rst one consists in modeling th…
Interpolation par régressions locales : application aux précipitations en France
Interpolation by local regressions applied to precipitation in France. Two interpolation methods are presented. The global one operates by means of a regression processing applied on the complete data set and the local one consists in a regression processing applied individually to many local sets made of recording stations each. The procedure is applied to the amount of precipitation in France. The local interpolation provides better results than the global one. The contradictory effects between crest lines and plains close below which are drier are well shown by the local model. This one allows other indicators from the regressions which interpretation provides keys for understanding how …
Semiparametric Models with Functional Responses in a Model Assisted Survey Sampling Setting : Model Assisted Estimation of Electricity Consumption Curves
This work adopts a survey sampling point of view to estimate the mean curve of large databases of functional data. When storage capacities are limited, selecting, with survey techniques a small fraction of the observations is an interesting alternative to signal compression techniques. We propose here to take account of real or multivariate auxiliary information available at a low cost for the whole population, with semiparametric model assisted approaches, in order to improve the accuracy of Horvitz-Thompson estimators of the mean curve. We first estimate the functional principal components with a design based point of view in order to reduce the dimension of the signals and then propose s…
Thresholding projection estimators in functional linear models
We consider the problem of estimating the regression function in functional linear regression models by proposing a new type of projection estimators which combine dimension reduction and thresholding. The introduction of a threshold rule allows to get consistency under broad assumptions as well as minimax rates of convergence under additional regularity hypotheses. We also consider the particular case of Sobolev spaces generated by the trigonometric basis which permits to get easily mean squared error of prediction as well as estimators of the derivatives of the regression function. We prove these estimators are minimax and rates of convergence are given for some particular cases.
Estimating with kernel smoothers the mean of functional data in a finite population setting. A note on variance estimation in presence of partially observed trajectories
In the near future, millions of load curves measuring the electricity consumption of French households in small time grids (probably half hours) will be available. All these collected load curves represent a huge amount of information which could be exploited using survey sampling techniques. In particular, the total consumption of a specific cus- tomer group (for example all the customers of an electricity supplier) could be estimated using unequal probability random sampling methods. Unfortunately, data collection may undergo technical problems resulting in missing values. In this paper we study a new estimation method for the mean curve in the presence of missing values which consists in…
Comparison of classification methods that combine clinical data and high-dimensional mass spectrometry data
Background The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. Technologies like mass spectrometry are commonly being used in proteomic research. Mass spectrometry signals show the proteomic profiles of the individuals under study at a given time. These profiles correspond to the recording of a large number of proteins, much larger than the number of individuals. These variables come in addition to or to complete classical clinical variables. The objective of this study is to evaluate and compare the predictive ability of new and existing models combining mass spectrometry data and classical clinical variables. This study was co…
Estimation of total electricity consumption curves by sampling in a finite population when some trajectories are partially unobserved
International audience; Millions of smart meters that are able to collect individual load curves, that is, electricity consumption time series, of residential and business customers at fine scale time grids are now deployed by electricity companies all around the world. It may be complex and costly to transmit and exploit such a large quantity of information, therefore it can be relevant to use survey sampling techniques to estimate mean load curves of specific groups of customers. Data collection, like every mass process, may undergo technical problems at every point of the metering and collection chain resulting in missing values. We consider imputation approaches (linear interpolation, k…
Le prix du climat et l'attrait du littoral en France
Il est fréquent d'entendre évoquer les surcoûts du littoral et parfois ceux des sites touristiques de montagne sans qu'aucune évaluation économique n'ait été faite en France pour vérifier ces allégations. L'attractivité de ces régions est indéniable, si l'on en juge par les migrations de population qu'elles connaissent. Les rubans littoraux de la France et ses zones de montagne se caractérisent par des spécificités climatiques importantes, par une activité touristique diffuse ou concentrée (parfois très fortement) et, peut-être, par des aspects irréductibles aux précédents qui constitueraient des attributs géographiques propres à certaines localisations du littoral et de la montagne : paysa…
Functional Data Analysis with R and Matlab by RAMSAY, J. O., HOOKER, G., and GRAVES, S.
Stochastic Approximation for Multivariate and Functional Median
We propose a very simple algorithm in order to estimate the geometric median, also called spatial median, of multivariate (Small (1990)) or functional data (Gervini (2008)) when the sample size is large. A simple and fast iterative approach based on the Robbins-Monro algorithm (Duflo (1997)) as well as its averaged version (Polyak and Juditsky (1992)) are shown to be effective for large samples of high dimension data. They are very fast and only require O(Nd) elementary operations, where N is the sample size and d is the dimension of data. The averaged approach is shown to be more effective and less sensitive to the tuning parameter. The ability of this new estimator to estimate accurately …
Estimating finite mixtures of semi-Markov chains: an application to the segmentation of temporal sensory data
Summary In food science, it is of great interest to obtain information about the temporal perception of aliments to create new products, to modify existing products or more generally to understand the mechanisms of perception. Temporal dominance of sensations is a technique to measure temporal perception which consists in choosing sequentially attributes describing a food product over tasting. This work introduces new statistical models based on finite mixtures of semi-Markov chains to describe data collected with the temporal dominance of sensations protocol, allowing different temporal perceptions for a same product within a population. The identifiability of the parameters of such mixtur…
Densité des points de mesure, types et limite des modèles d'interpolation
National audience
Interpolation par recherche d’information locale
Les méthodes d’interpolation, qu’elles procèdent de régressions ou de krigeage, sont globales, en ce sens qu’elles utilisent l’ensemble des données disponibles sur un territoire donné. Or, quand l’hétérogénéité spatiale de celui-ci est forte, la qualité des résultats s’en ressent. Pour dépasser cette difficulté, une méthode d’interpolation locale est proposée et appliquée, à titre de test, aux températures du territoire français. Partant d’un jeu de stations de mesures réparties sur cet espace digitalisé à 250 m, la méthode consiste à modéliser les variations spatiales locales de la température en considérant chaque point de la grille et lesnstations voisines les plus proches qui l’entouren…
Functional Data Analysis in NTCP Modeling: A New Method to Explore the Radiation Dose-Volume Effects
Purpose/Objective(s) To describe a novel method to explore radiation dose-volume effects. Functional data analysis is used to investigate the information contained in differential dose-volume histograms. The method is applied to the normal tissue complication probability modeling of rectal bleeding (RB) for patients irradiated in the prostatic bed by 3-dimensional conformal radiation therapy. Methods and Materials Kernel density estimation was used to estimate the individual probability density functions from each of the 141 rectum differential dose-volume histograms. Functional principal component analysis was performed on the estimated probability density functions to explore the variatio…
Interpolation by local information
Les méthodes d'interpolation, qu'elles procèdent de régressions ou de krigeage, sont globales, en ce sens qu'elles utilisent l'ensemble des données disponibles sur un territoire donné. Or, quand l'hétérogénéité spatiale de celui-ci est forte, la qualité des résultats s'en ressent. Pour dépasser cette difficulté, une méthode d'interpolation locale est proposée et appliquée, à titre de test, aux températures du territoire français. Partant d'un jeu de stations de mesures réparties sur cet espace digitalisé à 250 m, la méthode consiste à modéliser les variations spatiales locales de la température en considérant chaque point de la grille et les n stations voisines les plus proches qui l'entour…
Non-parametric approaches to the impact of Holstein heifer growth from birth to insemination on their dairy performance at lactation one
SUMMARYParametric approaches have been used widely to model animal growth and study the impact of growth profile on performance. Individual variation is often not considered in such approaches. However, non-parametric modelling allows this. Such an approach, based on spline functions, was used to study the importance of growth profiles from age 0 to 15 months (i.e. insemination) on milk yield and composition in primiparous cows. A dataset of 447 heifers was used for analysis of growth performance; 296 of them were also used to study impact on lactation. All of them originated from a French experimental herd and were born between 1986 and 2006. Clustering methods were also tested. Comparison…
Modeling temporal dominance of sensations data with stochastic processes
National audience
Varying-coefficient functional linear regression models
This article considers a generalization of the functional linear regression in which an additional real variable influences smoothly the functional coefficient. We thus define a varying-coefficient regression model for functional data. We propose two estimators based, respectively, on conditional functional principal regression and on local penalized regression splines and prove their pointwise consistency. We check, with the prediction one day ahead of ozone concentration in the city of Toulouse, the ability of such nonlinear functional approaches to produce competitive estimations.
Functional Principal Components Analysis with Survey Data
This work aims at performing Functional Principal Components Analysis (FPCA) with Horvitz-Thompson estimators when the observations are curves collected with survey sampling techniques. FPCA relies on estimations of the eigenelements of the covariance operator which can be seen as nonlinear functionals. Adapting to our functional context the linearization technique based on the influence function developed by Deville (1999), we prove that these estimators are asymptotically design unbiased and convergent. Under mild assumptions, asymptotic variances are derived for the FPCA’ estimators and convergent estimators of them are proposed. Our approach is illustrated with a simulation study and we…
Properties of Design-Based Functional Principal Components Analysis.
This work aims at performing Functional Principal Components Analysis (FPCA) with Horvitz-Thompson estimators when the observations are curves collected with survey sampling techniques. One important motivation for this study is that FPCA is a dimension reduction tool which is the first step to develop model assisted approaches that can take auxiliary information into account. FPCA relies on the estimation of the eigenelements of the covariance operator which can be seen as nonlinear functionals. Adapting to our functional context the linearization technique based on the influence function developed by Deville (1999), we prove that these estimators are asymptotically design unbiased and con…
Densité des points de mesure, types et limites des modèles d'interpolation
National audience; La climatologie utilise des méthodes d’interpolation pour estimer des données climatiques à partir des enregistrements de Météo France, données nécessaires pour divers usages (comme des estimations du prix hédoniste du climat, etc.). Différentes méthodes sont étudiées ici, le choix dépendant de la densité des points d’enregistrement Les exemples de la température (1495 stations météo en France métropolitaine) et de l’ensoleillement (111 stations, localisées en des points aux caractéristiques singulières) sont pris. Le krigeage, les régressions utilisant l’ensemble des stations, les mêmes régressions avec krigeage global et enfin des régressions locales (qui utilisent seul…
The price of climate : revealed preferences of French consumers
International audience; By the hedonic price method we study consumers' preferences for climate (temperature, very hot or cold days, and rainfall) in France, a temperate country with varied climates. Data are, on the one hand, individual attributes and prices of houses and workers and, on the other hand, climate attributes interpolated from weather stations. We show that the French households put a positive value on warmer temperatures while very hot days are a nuisance. Such climatic amenities are attributes of consumers' utility function; nevertheless, global warming assessments by economists, such as Stern Review report (2006) ignore these climatic preferences. The social welfare assessm…
Varying-time random effects models for longitudinal data: unmixing and temporal interpolation of remote-sensing data
Remote sensing is a helpful tool for crop monitoring or vegetation-growth estimation at a country or regional scale. However, satellite images generally have to cope with a compromise between the time frequency of observations and their resolution (i.e. pixel size). When concerned with high temporal resolution, we have to work with information on the basis of kilometric pixels, named mixed pixels, that represent aggregated responses of multiple land cover. Disaggreggation or unmixing is then necessary to downscale from the square kilometer to the local dynamic of each theme (crop, wood, meadows, etc.). Assuming the land use is known, that is to say the proportion of each theme within each m…