Search results for " High-dimensional data"

showing 10 items of 24 documents

Sample size planning for survival prediction with focus on high-dimensional data

2011

Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low-dimensional and high-dimensional explanatory variables. For dimension reduction in the high-dimensional setting, a variable selection step is inserted. If not all informative variables are included…

Statistics and ProbabilityClustering high-dimensional dataClinical Trials as TopicLung NeoplasmsModels StatisticalKaplan-Meier EstimateEpidemiologyProportional hazards modelDimensionality reductionGene ExpressionFeature selectionKaplan-Meier EstimateBiostatisticsPrognosisBrier scoreSample size determinationCarcinoma Non-Small-Cell LungSample SizeCensoring (clinical trials)StatisticsHumansProportional Hazards ModelsMathematicsStatistics in Medicine
researchProduct

gllvm: Fast analysis of multivariate abundance data with generalized linear latent variable models inr

2019

The work of J.N. was supported by the Wihuri Foundation. The work of S.T. was supported by the CRoNoS COST Action IC1408.F.K.C.H. was also supported by an ANU cross disciplinary grant.

0106 biological sciencesClustering high-dimensional dataMultivariate statisticsMultivariate analysisCross disciplinary010604 marine biology & hydrobiologyEcological ModelingMaximum likelihoodLatent variable010603 evolutionary biology01 natural sciencesAbundance (ecology)StatisticsCost actionEcology Evolution Behavior and SystematicsMathematicsMethods in Ecology and Evolution
researchProduct

Scaling Up a Metric Learning Algorithm for Image Recognition and Representation

2008

Maximally Collapsing Metric Learning is a recently proposed algorithm to estimate a metric matrix from labelled data. The purpose of this work is to extend this approach by considering a set of landmark points which can in principle reduce the cost per iteration in one order of magnitude. The proposal is in fact a generalized version of the original algorithm that can be applied to larger amounts of higher dimensional data. Exhaustive experimentation shows that very similar behavior at a lower cost is obtained for a wide range of the number of landmark points used.

Clustering high-dimensional dataSet (abstract data type)Range (mathematics)LandmarkMetric (mathematics)Landmark pointRepresentation (mathematics)AlgorithmFacial recognition systemMathematics
researchProduct

The Three Steps of Clustering In The Post-Genomic Era

2013

This chapter descibes the basic algorithmic components that are involved in clustering, with particular attention to classification of microarray data.

Clustering high-dimensional dataSettore INF/01 - Informaticabusiness.industryCorrelation clusteringPattern recognitioncomputer.software_genreBiclusteringCURE data clustering algorithmClustering Classification Biological Data MiningConsensus clusteringArtificial intelligenceData miningbusinessCluster analysiscomputerMathematics
researchProduct

Dimensionality reduction via regression on hyperspectral infrared sounding data

2014

This paper introduces a new method for dimensionality reduction via regression (DRR). The method generalizes Principal Component Analysis (PCA) in such a way that reduces the variance of the PCA scores. In order to do so, DRR relies on a deflationary process in which a non-linear regression reduces the redundancy between the PC scores. Unlike other nonlinear dimensionality reduction methods, DRR is easy to apply, it has out-of-sample extension, it is invertible, and the learned transformation is volume-preserving. These properties make the method useful for a wide range of applications, especially in very high dimensional data in general, and for hyperspectral image processing in particular…

Clustering high-dimensional dataRedundancy (information theory)business.industryDimensionality reductionPrincipal component analysisFeature extractionNonlinear dimensionality reductionHyperspectral imagingPattern recognitionArtificial intelligencebusinessMathematicsCurse of dimensionality2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS)
researchProduct

Computation Cluster Validation in the Big Data Era

2017

Data-driven class discovery, i.e., the inference of cluster structure in a dataset, is a fundamental task in Data Analysis, in particular for the Life Sciences. We provide a tutorial on the most common approaches used for that task, focusing on methodologies for the prediction of the number of clusters in a dataset. Although the methods that we present are general in terms of the data for which they can be used, we offer a case study relevant for Microarray Data Analysis.

Clustering high-dimensional dataClass (computer programming)Clustering validation measureSettore INF/01 - InformaticaComputer sciencebusiness.industryBig dataInferenceMicroarrays data analysiscomputer.software_genreGap statisticTask (project management)ComputingMethodologies_PATTERNRECOGNITIONCURE data clustering algorithmConsensus clusteringHypothesis testing in statisticClustering Class Discovery in Data Algorithmsb Clustering algorithmFigure of meritConsensus clusteringData miningCluster analysisbusinesscomputer
researchProduct

Inferring networks from high-dimensional data with mixed variables

2014

We present two methodologies to deal with high-dimensional data with mixed variables, the strongly decomposable graphical model and the regression-type graphical model. The first model is used to infer conditional independence graphs. The latter model is applied to compute the relative importance or contribution of each predictor to the response variables. Recently, penalized likelihood approaches have also been proposed to estimate graph structures. In a simulation study, we compare the performance of the strongly decomposable graphical model and the graphical lasso in terms of graph recovering. Five different graph structures are used to simulate the data: the banded graph, the cluster gr…

Random graphClustering high-dimensional dataPenalized likelihoodTheoretical computer scienceConditional independenceDecomposable Graphical Models.Computer scienceCluster graphMixed variablesGraphical modelMutual informationPenalized Gaussian Graphical ModelSettore SECS-S/01 - Statistica
researchProduct

Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters

2017

Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…

0301 basic medicineClustering high-dimensional dataMathematical optimizationCognitive NeuroscienceSingle-linkage clusteringCorrelation clustering02 engineering and technologyComputer Science ApplicationsHierarchical clusteringDetermining the number of clusters in a data set03 medical and health sciences030104 developmental biologyArtificial Intelligence0202 electrical engineering electronic engineering information engineeringCluster (physics)020201 artificial intelligence & image processingQACluster analysisAlgorithmk-medians clusteringMathematicsNeurocomputing
researchProduct

Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm

2003

A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.

Clustering high-dimensional datak-medoidsComputer scienceCURE data clustering algorithmSingle-linkage clusteringCanopy clustering algorithmVariance (accounting)Data miningCluster analysiscomputer.software_genrecomputerk-medians clustering
researchProduct

The on-line curvilinear component analysis (onCCA) for real-time data reduction

2015

Real time pattern recognition applications often deal with high dimensional data, which require a data reduction step which is only performed offline. However, this loses the possibility of adaption to a changing environment. This is also true for other applications different from pattern recognition, like data visualization for input inspection. Only linear projections, like the principal component analysis, can work in real time by using iterative algorithms while all known nonlinear techniques cannot be implemented in such a way and actually always work on the whole database at each epoch. Among these nonlinear tools, the Curvilinear Component Analysis (CCA), which is a non-convex techni…

Clustering high-dimensional dataBregman divergenceComputer scienceneural networkprojectionBregman divergenceNovelty detectionSynthetic dataData visualizationArtificial Intelligencebranch and boundComputer visionunfoldingcurvilinear component analysisCurvilinear coordinatesArtificial neural networkbusiness.industryVector quantizationPattern recognitiononline algorithmbearing faultvector quantizationPattern recognition (psychology)Principal component analysisbearing fault; branch and bound; Bregman divergence; curvilinear component analysis; data reduction; neural network; novelty detection; online algorithm; projection; unfolding; vector quantization; Software; Artificial Intelligencedata reductionArtificial intelligencebusinessnovelty detectionSoftware
researchProduct