0000000000411816

AUTHOR

Mohamed Chaouch

Design-based estimation for geometric quantiles with application to outlier detection

Geometric quantiles are investigated using data collected from a complex survey. Geometric quantiles are an extension of univariate quantiles in a multivariate set-up that uses the geometry of multivariate data clouds. A very important application of geometric quantiles is the detection of outliers in multivariate data by means of quantile contours. A design-based estimator of geometric quantiles is constructed and used to compute quantile contours in order to detect outliers in both multivariate data and survey sampling set-ups. An algorithm for computing geometric quantile estimates is also developed. Under broad assumptions, the asymptotic variance of the quantile estimator is derived an…

research product

Contribution à l'estimation non paramétrique des quantiles géométriques et à l'analyse des données fonctionnelles

In this dissertation we study the nonparametric geometric quantile estimation, conditional geometric quantiles estimation and functional data analysis. First, we are interested to the definition of geometric quantiles. Different simulations show that Transformation-Retransformation technique should be used to estimate geometric quantiles when the distribution is not spheric. A real study shows that, data are better modelized by geometric quantiles than by marginal one's, especially when variables that make up the random vector are correlated. Then we estimate geometric quantiles when data are obtained by survey sampling techniques. First, we propose an unbaised estimator, then using lineari…

research product

Stochastic Approximation for Multivariate and Functional Median

We propose a very simple algorithm in order to estimate the geometric median, also called spatial median, of multivariate (Small (1990)) or functional data (Gervini (2008)) when the sample size is large. A simple and fast iterative approach based on the Robbins-Monro algorithm (Duflo (1997)) as well as its averaged version (Polyak and Juditsky (1992)) are shown to be effective for large samples of high dimension data. They are very fast and only require O(Nd) elementary operations, where N is the sample size and d is the dimension of data. The averaged approach is shown to be more effective and less sensitive to the tuning parameter. The ability of this new estimator to estimate accurately …

research product

Using Complex Surveys to Estimate theL1-Median of a Functional Variable: Application to Electricity Load Curves

Mean proles are widely used as indicators of the electricity consumption habits of customers. Currently, Electricit e De France (EDF), estimates class load proles by using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean prole: the L1-median prole which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median prole and avoid storing all of the data. We propose here estimators of the median trajec…

research product

Quantiles géométriques et sondage

International audience; Dans ce travail, nous nous sommes intéressées à l'estimation du quantile géométrique pour des données issues d'un plan de sondage. Nous donnons un estimateur du quan- tile géométrique basé sur le plan de sondage ainsi qu'une méthode itérative pour l'obtenir à partir des données d'échantillonnage. Sous des conditions générales, nous dérivons la variance asymptotique de l'estimateur du quantile et nous proposons un estimateur con- vergent de cette variance. Le bon comportement de l'estimateur du quantile géométrique est véri fié par une étude par simulation.

research product

Estimation des quantiles géométriques conditionnels et non conditionnels

International audience

research product

Functional Principal Components Analysis with Survey Data

This work aims at performing Functional Principal Components Analysis (FPCA) with Horvitz-Thompson estimators when the observations are curves collected with survey sampling techniques. FPCA relies on estimations of the eigenelements of the covariance operator which can be seen as nonlinear functionals. Adapting to our functional context the linearization technique based on the influence function developed by Deville (1999), we prove that these estimators are asymptotically design unbiased and convergent. Under mild assumptions, asymptotic variances are derived for the FPCA’ estimators and convergent estimators of them are proposed. Our approach is illustrated with a simulation study and we…

research product

Properties of Design-Based Functional Principal Components Analysis.

This work aims at performing Functional Principal Components Analysis (FPCA) with Horvitz-Thompson estimators when the observations are curves collected with survey sampling techniques. One important motivation for this study is that FPCA is a dimension reduction tool which is the first step to develop model assisted approaches that can take auxiliary information into account. FPCA relies on the estimation of the eigenelements of the covariance operator which can be seen as nonlinear functionals. Adapting to our functional context the linearization technique based on the influence function developed by Deville (1999), we prove that these estimators are asymptotically design unbiased and con…

research product

Using complex surveys to estimate the $L_1$-median of a functional variable: application to electricity load curves

Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in \'Electricit\'e De France (EDF), class load profiles are estimated using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean profile: the $L_1$-median profile which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here estimators of…

research product