Search results for " High-dimensional data"

showing 10 items of 24 documents

Making nonlinear manifold learning models interpretable: The manifold grand tour

2015

Smooth nonlinear topographic maps of the data distribution to guide a Grand Tour visualisation.Prioritisation of data linear views that are most consistent with data structure in the maps.Useful visualisations that cannot be obtained by other more classical approaches. Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to…

Clustering high-dimensional dataQA75Nonlinear dimensionality reductionDiscriminative clusteringComputer scienceVisualització de la informaciócomputer.software_genreData visualizationProjection (mathematics)Information visualizationArtificial IntelligenceQA:Informàtica::Infografia [Àrees temàtiques de la UPC]business.industryData visualizationDimensionality reductionGrand tourGeneral EngineeringNonlinear dimensionality reductionTopographic mapData structureComputer Science ApplicationsVisualizationManifold learningData miningbusinesscomputerGenerative topographic mappingLinear projections
researchProduct

Dimensionality reduction via regression on hyperspectral infrared sounding data

2014

This paper introduces a new method for dimensionality reduction via regression (DRR). The method generalizes Principal Component Analysis (PCA) in such a way that reduces the variance of the PCA scores. In order to do so, DRR relies on a deflationary process in which a non-linear regression reduces the redundancy between the PC scores. Unlike other nonlinear dimensionality reduction methods, DRR is easy to apply, it has out-of-sample extension, it is invertible, and the learned transformation is volume-preserving. These properties make the method useful for a wide range of applications, especially in very high dimensional data in general, and for hyperspectral image processing in particular…

Clustering high-dimensional dataRedundancy (information theory)business.industryDimensionality reductionPrincipal component analysisFeature extractionNonlinear dimensionality reductionHyperspectral imagingPattern recognitionArtificial intelligencebusinessMathematicsCurse of dimensionality2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS)
researchProduct

Scaling Up a Metric Learning Algorithm for Image Recognition and Representation

2008

Maximally Collapsing Metric Learning is a recently proposed algorithm to estimate a metric matrix from labelled data. The purpose of this work is to extend this approach by considering a set of landmark points which can in principle reduce the cost per iteration in one order of magnitude. The proposal is in fact a generalized version of the original algorithm that can be applied to larger amounts of higher dimensional data. Exhaustive experimentation shows that very similar behavior at a lower cost is obtained for a wide range of the number of landmark points used.

Clustering high-dimensional dataSet (abstract data type)Range (mathematics)LandmarkMetric (mathematics)Landmark pointRepresentation (mathematics)AlgorithmFacial recognition systemMathematics
researchProduct

The Three Steps of Clustering In The Post-Genomic Era

2013

This chapter descibes the basic algorithmic components that are involved in clustering, with particular attention to classification of microarray data.

Clustering high-dimensional dataSettore INF/01 - Informaticabusiness.industryCorrelation clusteringPattern recognitioncomputer.software_genreBiclusteringCURE data clustering algorithmClustering Classification Biological Data MiningConsensus clusteringArtificial intelligenceData miningbusinessCluster analysiscomputerMathematics
researchProduct

A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data

2013

Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.

Clustering high-dimensional databusiness.industryComputer sciencePattern recognitionInformation theorycomputer.software_genreUncorrelatedDecomposition method (queueing theory)Data miningArtificial intelligencebusinessFeature setcomputerClassifier (UML)Curse of dimensionality
researchProduct

Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs

2014

We develop an algorithm that incorporates network information into regression settings. It simultaneously estimates the covariate coefficients and the signs of the network connections (i.e. whether the connections are of an activating or of a repressing type). For the coefficient estimation steps an additional penalty is set on top of the lasso penalty, similarly to Li and Li (2008). We develop a fast implementation for the new method based on coordinate descent. Furthermore, we show how the new methods can be applied to time-to-event data. The new method yields good results in simulation studies concerning sensitivity and specificity of non-zero covariate coefficients, estimation of networ…

Clustering high-dimensional databusiness.industryjel:C41jel:C13Machine learningcomputer.software_genreRegressionhigh-dimensional data gene expression data pathway information penalized regressionConnection (mathematics)Set (abstract data type)Lasso (statistics)CovariateArtificial intelligenceSensitivity (control systems)businessCoordinate descentAlgorithmcomputerMathematics
researchProduct

Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm

2003

A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.

Clustering high-dimensional datak-medoidsComputer scienceCURE data clustering algorithmSingle-linkage clusteringCanopy clustering algorithmVariance (accounting)Data miningCluster analysiscomputer.software_genrecomputerk-medians clustering
researchProduct

Penalized regression and clustering in high-dimensional data

The main goal of this Thesis is to describe numerous statistical techniques that deal with high-dimensional genomic data. The Thesis begins with a review of the literature on penalized regression models, with particular attention to least absolute shrinkage and selection operator (LASSO) or L1-penalty methods. L1 logistic/multinomial regression models are used for variable selection and discriminant analysis with a binary/categorical response variable. The Thesis discusses and compares several methods that are commonly utilized in genetics, and introduces new strategies to select markers according to their informative content and to discriminate clusters by offering reduced panels for popul…

High-dimensional dataQuantile regression coefficients modelingTuning parameter selectionGenomic dataLasso regressionLasso regression; High-dimensional data; Genomic data; Tuning parameter selection; Quantile regression coefficients modeling; Curves clustering;Settore SECS-S/01 - StatisticaCurves clustering
researchProduct

Inferring networks from high-dimensional data with mixed variables

2014

We present two methodologies to deal with high-dimensional data with mixed variables, the strongly decomposable graphical model and the regression-type graphical model. The first model is used to infer conditional independence graphs. The latter model is applied to compute the relative importance or contribution of each predictor to the response variables. Recently, penalized likelihood approaches have also been proposed to estimate graph structures. In a simulation study, we compare the performance of the strongly decomposable graphical model and the graphical lasso in terms of graph recovering. Five different graph structures are used to simulate the data: the banded graph, the cluster gr…

Random graphClustering high-dimensional dataPenalized likelihoodTheoretical computer scienceConditional independenceDecomposable Graphical Models.Computer scienceCluster graphMixed variablesGraphical modelMutual informationPenalized Gaussian Graphical ModelSettore SECS-S/01 - Statistica
researchProduct

Sample size planning for survival prediction with focus on high-dimensional data

2011

Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low-dimensional and high-dimensional explanatory variables. For dimension reduction in the high-dimensional setting, a variable selection step is inserted. If not all informative variables are included…

Statistics and ProbabilityClustering high-dimensional dataClinical Trials as TopicLung NeoplasmsModels StatisticalKaplan-Meier EstimateEpidemiologyProportional hazards modelDimensionality reductionGene ExpressionFeature selectionKaplan-Meier EstimateBiostatisticsPrognosisBrier scoreSample size determinationCarcinoma Non-Small-Cell LungSample SizeCensoring (clinical trials)StatisticsHumansProportional Hazards ModelsMathematicsStatistics in Medicine
researchProduct