Search results for "High-dimensional data"
showing 9 items of 29 documents
Inferring networks from high-dimensional data with mixed variables
2014
We present two methodologies to deal with high-dimensional data with mixed variables, the strongly decomposable graphical model and the regression-type graphical model. The first model is used to infer conditional independence graphs. The latter model is applied to compute the relative importance or contribution of each predictor to the response variables. Recently, penalized likelihood approaches have also been proposed to estimate graph structures. In a simulation study, we compare the performance of the strongly decomposable graphical model and the graphical lasso in terms of graph recovering. Five different graph structures are used to simulate the data: the banded graph, the cluster gr…
Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters
2017
Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…
Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm
2003
A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.
The on-line curvilinear component analysis (onCCA) for real-time data reduction
2015
Real time pattern recognition applications often deal with high dimensional data, which require a data reduction step which is only performed offline. However, this loses the possibility of adaption to a changing environment. This is also true for other applications different from pattern recognition, like data visualization for input inspection. Only linear projections, like the principal component analysis, can work in real time by using iterative algorithms while all known nonlinear techniques cannot be implemented in such a way and actually always work on the whole database at each epoch. Among these nonlinear tools, the Curvilinear Component Analysis (CCA), which is a non-convex techni…
Data Analysis and Bioinformatics
2007
Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of different types. And this is a typical situation in biology. Some cases studies are also described.
A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data
2013
Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.
GenClust: A genetic algorithm for clustering gene expression data
2005
Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, …
Meeting the Challenges of High-Dimensional Single-Cell Data Analysis in Immunology
2019
Recent advances in cytometry have radically altered the fate of single-cell proteomics by allowing a more accurate understanding of complex biological systems. Mass cytometry (CyTOF) provides simultaneous single-cell measurements that are crucial to understand cellular heterogeneity and identify novel cellular subsets. High-dimensional CyTOF data were traditionally analyzed by gating on bivariate dot plots, which are not only laborious given the quadratic increase of complexity with dimension but are also biased through manual gating. This review aims to discuss the impact of new analysis techniques for in-depths insights into the dynamics of immune regulation obtained from static snapshot …
Making nonlinear manifold learning models interpretable: The manifold grand tour
2015
Smooth nonlinear topographic maps of the data distribution to guide a Grand Tour visualisation.Prioritisation of data linear views that are most consistent with data structure in the maps.Useful visualisations that cannot be obtained by other more classical approaches. Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to…