Search results for "Dimension"

showing 10 items of 2766 documents

Sparse relative risk regression models

2020

Summary Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios…

Statistics and ProbabilityClustering high-dimensional dataComputer sciencedgLARSInferenceScale (descriptive set theory)BiostatisticsMachine learningcomputer.software_genreRisk Assessment01 natural sciencesRegularization (mathematics)Relative risk regression model010104 statistics & probability03 medical and health sciencesNeoplasmsCovariateHumansComputer Simulation0101 mathematicsOnline Only ArticlesSurvival analysis030304 developmental biology0303 health sciencesModels Statisticalbusiness.industryLeast-angle regressionRegression analysisGeneral MedicineSurvival AnalysisHigh-dimensional dataGene expression dataRegression AnalysisArtificial intelligenceStatistics Probability and UncertaintySettore SECS-S/01 - StatisticabusinessSparsitycomputerBiostatistics
researchProduct

A fast and recursive algorithm for clustering large datasets with k-medians

2012

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…

Statistics and ProbabilityClustering high-dimensional dataFOS: Computer and information sciencesMathematical optimizationhigh dimensional dataMachine Learning (stat.ML)02 engineering and technologyStochastic approximation01 natural sciencesStatistics - Computation010104 statistics & probabilityk-medoidsStatistics - Machine Learning[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]stochastic approximation0202 electrical engineering electronic engineering information engineeringComputational statisticsrecursive estimatorsAlmost surely[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST]0101 mathematicsCluster analysisComputation (stat.CO)Mathematicsaveragingk-medoidsRobbins MonroApplied MathematicsEstimator[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]stochastic gradient[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]MedoidComputational MathematicsComputational Theory and Mathematicsonline clustering020201 artificial intelligence & image processingpartitioning around medoidsAlgorithm
researchProduct

The asymptotic covariance matrix of the Oja median

2003

The Oja median, based on a sample of multivariate data, is an affine equivariant estimate of the centre of the distribution. It reduces to the sample median in one dimension and has several nice robustness and efficiency properties. We develop different representations of its asymptotic variance and discuss ways to estimate this quantity. We consider symmetric multivariate models and also the more narrow elliptical models. A small simulation study is included to compare finite sample results to the asymptotic formulas.

Statistics and ProbabilityCombinatoricsDelta methodMultivariate statisticsMatrix (mathematics)Multivariate analysis of varianceDimension (vector space)Matrix t-distributionApplied mathematicsEquivariant mapAffine transformationStatistics Probability and UncertaintyMathematicsStatistics & Probability Letters
researchProduct

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

2017

Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…

Statistics and ProbabilityComputer scienceComputationDimensionality reductionIncremental methods02 engineering and technologyMissing data01 natural sciences010104 statistics & probabilityData explosionStreaming dataPrincipal component analysis0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0101 mathematicsStatistics Probability and UncertaintyAlgorithmEigendecomposition of a matrixInternational Statistical Review
researchProduct

Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis

2017

International audience; The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicat…

Statistics and ProbabilityComputer scienceMathematics - Statistics TheoryStatistics Theory (math.ST)01 natural sciences010104 statistics & probabilityMatrix (mathematics)Dimension (vector space)Geometric medianStochastic gradientFOS: Mathematics0101 mathematicsL1-median010102 general mathematicsEstimator[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]Geometric medianCovariance[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]Functional dataMSC: 62G05 62L20Principal component analysisProjection pursuitAnomaly detectionRecursive robust estimationStatistics Probability and UncertaintyAlgorithm
researchProduct

A review of second‐order blind identification methods

2021

Second-order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sign…

Statistics and ProbabilityComputer sciencebusiness.industryDimensionality reductionSecond order blind identificationPattern recognitionArtificial intelligencebusinessBlind signal separationWIREs Computational Statistics
researchProduct

Intensity estimation for inhomogeneous Gibbs point process with covariates-dependent chemical activity

2014

Recent development of intensity estimation for inhomogeneous spatial point processes with covariates suggests that kerneling in the covariate space is a competitive intensity estimation method for inhomogeneous Poisson processes. It is not known whether this advantageous performance is still valid when the points interact. In the simplest common case, this happens, for example, when the objects presented as points have a spatial dimension. In this paper, kerneling in the covariate space is extended to Gibbs processes with covariates-dependent chemical activity and inhibitive interactions, and the performance of the approach is studied through extensive simulation experiments. It is demonstr…

Statistics and ProbabilityDimensionality reductionNonparametric statisticsPoisson distributionPoint processsymbols.namesakeDimension (vector space)CovariatesymbolsEconometricsStatistics::MethodologyStatistical physicsStatistics Probability and UncertaintySmoothingMathematicsParametric statisticsStatistica Neerlandica
researchProduct

Applications de type Lasota–Yorke à trou : mesure de probabilité conditionellement invariante et mesure de probabilité invariante sur l'ensemble des …

2003

Abstract Let T :I→I be a Lasota–Yorke map on the interval I, let Y be a nontrivial sub-interval of I and g 0 :I→ R + , be a strictly positive potential which belongs to BV and admits a conformal measure m. We give constructive conditions on Y ensuring the existence of absolutely continuous (w.r.t. m) conditionally invariant probability measures to nonabsorption in Y. These conditions imply also existence of an invariant probability measure on the set X∞ of points which never fall into Y. Our conditions allow rather “large” holes.

Statistics and ProbabilityDiscrete mathematicsPure mathematicsHausdorff dimensionErgodic theoryInvariant measureInterval (mathematics)Statistics Probability and UncertaintyInvariant (mathematics)Absolute continuityMeasure (mathematics)Probability measureMathematicsAnnales de l'Institut Henri Poincare (B) Probability and Statistics
researchProduct

The conditional censored graphical lasso estimator

2020

© 2020, Springer Science+Business Media, LLC, part of Springer Nature. In many applied fields, such as genomics, different types of data are collected on the same system, and it is not uncommon that some of these datasets are subject to censoring as a result of the measurement technologies used, such as data generated by polymerase chain reactions and flow cytometer. When the overall objective is that of network inference, at possibly different levels of a system, information coming from different sources and/or different steps of the analysis can be integrated into one model with the use of conditional graphical models. In this paper, we develop a doubly penalized inferential procedure for…

Statistics and ProbabilityFOS: Computer and information sciencesComputer scienceGaussianInferenceData typeTheoretical Computer Sciencehigh-dimensional settingDatabase normalizationMethodology (stat.ME)symbols.namesakeLasso (statistics)Graphical modelConditional Gaussian graphical modelcensored graphical lassoStatistics - MethodologyHigh-dimensional settingconditional Gaussian graphical modelssparsityEstimatorCensoring (statistics)Censored graphical lassoComputational Theory and MathematicssymbolsCensored dataStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaSparsityAlgorithm
researchProduct

2021

Abstract We prove the existence of a smoothing for a toroidal crossing space under mild assumptions. By linking log structures with infinitesimal deformations, the result receives a very compact form for normal crossing spaces. The main approach is to study log structures that are incoherent on a subspace of codimension 2 and prove a Hodge–de Rham degeneration theorem for such log spaces that also settles a conjecture by Danilov. We show that the homotopy equivalence between Maurer–Cartan solutions and deformations combined with Batalin–Vilkovisky theory can be used to obtain smoothings. The construction of new Calabi–Yau and Fano manifolds as well as Frobenius manifold structures on moduli…

Statistics and ProbabilityFrobenius manifoldPure mathematicsAlgebra and Number TheoryConjectureHomotopyCodimensionFano planeSpace (mathematics)Moduli spaceMathematics::Algebraic GeometryDiscrete Mathematics and CombinatoricsGeometry and TopologyMathematics::Symplectic GeometryMathematical PhysicsAnalysisSmoothingMathematicsForum of Mathematics, Pi
researchProduct