6533b870fe1ef96bd12d0361

RESEARCH PRODUCT

Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values

Anne De Moliner

subject

Linear mixed modelsSmall area estimationMissing dataRegression treesEstimation sur petits domaines[MATH.MATH-GM] Mathematics [math]/General Mathematics [math.GM]Estimateurs à noyauModèles linéaires mixtesRandom forestsBiais conditionnelsFunctional dataSurvey sampling[MATH.MATH-GM]Mathematics [math]/General Mathematics [math.GM]RobustesseDonnées fonctionnellesPlus proches voisinsForêts aléatoiresConditional biasKernel estimatorsNearest neighboursSondageDonnées manquantesRobustnessArbres de régression

description

In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios.

https://theses.hal.science/tel-01820617