0000000000265233
AUTHOR
Antoine Godichon-baggioni
Stochastic algorithms for robust statistics in high dimension
This thesis focus on stochastic algorithms in high dimension as well as their application in robust statistics. In what follows, the expression high dimension may be used when the the size of the studied sample is large or when the variables we consider take values in high dimensional spaces (not necessarily finite). In order to analyze these kind of data, it can be interesting to consider algorithms which are fast, which do not need to store all the data, and which allow to update easily the estimates. In large sample of high dimensional data, outliers detection is often complicated. Nevertheless, these outliers, even if they are not many, can strongly disturb simple indicators like the me…
Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
International audience; The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicat…
Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms: Lp and almost sure rates of convergence
The geometric median, also called L 1 -median, is often used in robust statistics. Moreover, it is more and more usual to deal with large samples taking values in high dimensional spaces. In this context, a fast recursive estimator has been introduced by Cardot et?al. (2013). This work aims at studying more precisely the asymptotic behavior of the estimators of the geometric median based on such non linear stochastic gradient algorithms. The L p rates of convergence as well as almost sure rates of convergence of these estimators are derived in general separable Hilbert spaces. Moreover, the optimal rates of convergence in quadratic mean of the averaged algorithm are also given.