Search results for "artificial intelligence"
showing 10 items of 6122 documents
A fast and recursive algorithm for clustering large datasets with k-medians
2012
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…
Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
2017
Summary Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical a…
Blind Source Separation Based on Joint Diagonalization in R: The Packages JADE and BSSasymp
2017
Blind source separation (BSS) is a well-known signal processing tool which is used to solve practical data analysis problems in various fields of science. In BSS, we assume that the observed data consists of linear mixtures of latent variables. The mixing system and the distributions of the latent variables are unknown. The aim is to find an estimate of an unmixing matrix which then transforms the observed data back to latent sources. In this paper we present the R packages JADE and BSSasymp. The package JADE offers several BSS methods which are based on joint diagonalization. Package BSSasymp contains functions for computing the asymptotic covariance matrices as well as their data-based es…
Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods
2014
Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …
Anthropometry: An R Package for Analysis of Anthropometric Data
2017
The development of powerful new 3D scanning techniques has enabled the generation of large up-to-date anthropometric databases which provide highly valued data to improve the ergonomic design of products adapted to the user population. As a consequence, Ergonomics and Anthropometry are two increasingly quantitative fields, so advanced statistical methodologies and modern software tools are required to get the maximum benefit from anthropometric data. This paper presents a new R package, called Anthropometry, which is available on the Comprehensive R Archive Network. It brings together some statistical methodologies concerning clustering, statistical shape analysis, statistical archetypal an…
Overall Objective Priors
2015
In multi-parameter models, reference priors typically depend on the parameter or quantity of interest, and it is well known that this is necessary to produce objective posterior distributions with optimal properties. There are, however, many situations where one is simultaneously interested in all the parameters of the model or, more realistically, in functions of them that include aspects such as prediction, and it would then be useful to have a single objective prior that could safely be used to produce reasonable posterior inferences for all the quantities of interest. In this paper, we consider three methods for selecting a single objective prior and study, in a variety of problems incl…
Sequential Monte Carlo methods in Bayesian joint models for longitudinal and time-to-event data
2020
The statistical analysis of the information generated by medical follow-up is a very important challenge in the field of personalized medicine. As the evolutionary course of a patient's disease progresses, his/her medical follow-up generates more and more information that should be processed immediately in order to review and update his/her prognosis and treatment. Hence, we focus on this update process through sequential inference methods for joint models of longitudinal and time-to-event data from a Bayesian perspective. More specifically, we propose the use of sequential Monte Carlo (SMC) methods for static parameter joint models with the intention of reducing computational time in each…
A review of second‐order blind identification methods
2021
Second-order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sign…
Archetypoids: A new approach to define representative archetypal data
2015
[EN] The new concept archetypoids is introduced. Archetypoid analysis represents each observation in a dataset as a mixture of actual observations in the dataset, which are pure type or archetypoids. Unlike archetype analysis, archetypoids are real observations, not a mixture of observations. This is relevant when existing archetypal observations are needed, rather than fictitious ones. An algorithm is proposed to find them and some of their theoretical properties are introduced. It is also shown how they can be obtained when only dissimilarities between observations are known (features are unavailable). Archetypoid analysis is illustrated in two design problems and several examples, compar…
Testing abnormality in the spatial arrangement of cells in the corneal endothelium using spatial point processes
2001
The study of central corneal endothelium morphology is important in Ophthalmology. Some of the pathologies that could compromise endothelial cell morphology are trauma, cataract, surgery, use of contact lenses, corneal dystrophies or degenerations. The quantitative analysis of cell shape and cellular pattern is more sensitive in detecting subtle changes in endothelial morphology than cell density measurement or cell area analysis. In this paper, the morphology of the central cornea, the most important area from the point of view of vision, is studied through an associated bivariate spatial point pattern: the centroids of the cells and the triple points, that is, the points where three diffe…