Search results for "Dimension"
showing 10 items of 2766 documents
Sparse relative risk regression models
2020
Summary Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios…
A fast and recursive algorithm for clustering large datasets with k-medians
2012
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…
The asymptotic covariance matrix of the Oja median
2003
The Oja median, based on a sample of multivariate data, is an affine equivariant estimate of the centre of the distribution. It reduces to the sample median in one dimension and has several nice robustness and efficiency properties. We develop different representations of its asymptotic variance and discuss ways to estimate this quantity. We consider symmetric multivariate models and also the more narrow elliptical models. A small simulation study is included to compare finite sample results to the asymptotic formulas.
Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
2017
Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…
Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
2017
International audience; The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicat…
A review of second‐order blind identification methods
2021
Second-order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sign…
Intensity estimation for inhomogeneous Gibbs point process with covariates-dependent chemical activity
2014
Recent development of intensity estimation for inhomogeneous spatial point processes with covariates suggests that kerneling in the covariate space is a competitive intensity estimation method for inhomogeneous Poisson processes. It is not known whether this advantageous performance is still valid when the points interact. In the simplest common case, this happens, for example, when the objects presented as points have a spatial dimension. In this paper, kerneling in the covariate space is extended to Gibbs processes with covariates-dependent chemical activity and inhibitive interactions, and the performance of the approach is studied through extensive simulation experiments. It is demonstr…
Applications de type Lasota–Yorke à trou : mesure de probabilité conditionellement invariante et mesure de probabilité invariante sur l'ensemble des …
2003
Abstract Let T :I→I be a Lasota–Yorke map on the interval I, let Y be a nontrivial sub-interval of I and g 0 :I→ R + , be a strictly positive potential which belongs to BV and admits a conformal measure m. We give constructive conditions on Y ensuring the existence of absolutely continuous (w.r.t. m) conditionally invariant probability measures to nonabsorption in Y. These conditions imply also existence of an invariant probability measure on the set X∞ of points which never fall into Y. Our conditions allow rather “large” holes.
The conditional censored graphical lasso estimator
2020
© 2020, Springer Science+Business Media, LLC, part of Springer Nature. In many applied fields, such as genomics, different types of data are collected on the same system, and it is not uncommon that some of these datasets are subject to censoring as a result of the measurement technologies used, such as data generated by polymerase chain reactions and flow cytometer. When the overall objective is that of network inference, at possibly different levels of a system, information coming from different sources and/or different steps of the analysis can be integrated into one model with the use of conditional graphical models. In this paper, we develop a doubly penalized inferential procedure for…
2021
Abstract We prove the existence of a smoothing for a toroidal crossing space under mild assumptions. By linking log structures with infinitesimal deformations, the result receives a very compact form for normal crossing spaces. The main approach is to study log structures that are incoherent on a subspace of codimension 2 and prove a Hodge–de Rham degeneration theorem for such log spaces that also settles a conjecture by Danilov. We show that the homotopy equivalence between Maurer–Cartan solutions and deformations combined with Batalin–Vilkovisky theory can be used to obtain smoothings. The construction of new Calabi–Yau and Fano manifolds as well as Frobenius manifold structures on moduli…