Search results for "Mining"

showing 10 items of 1730 documents

Distance Functions, Clustering Algorithms and Microarray Data Analysis

2010

Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function works best has been investigated, but no final conclusion has been reached. The aim of this extended abstract is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic sepa…

Clustering high-dimensional dataFuzzy clusteringSettore INF/01 - Informaticabusiness.industryCorrelation clusteringMachine learningcomputer.software_genrePearson product-moment correlation coefficientRanking (information retrieval)Euclidean distancesymbols.namesakeClustering distance measuressymbolsArtificial intelligenceData miningbusinessCluster analysiscomputerMathematicsDe facto standard

researchProduct

Structural clustering of millions of molecular graphs

2014

We propose an algorithm for clustering very large molecular graph databases according to scaffolds (i.e., large structural overlaps) that are common between cluster members. Our approach first partitions the original dataset into several smaller datasets using a greedy clustering approach named APreClus based on dynamic seed clustering. APreClus is an online and instance incremental clustering algorithm delaying the final cluster assignment of an instance until one of the so-called pending clusters the instance belongs to has reached significant size and is converted to a fixed cluster. Once a cluster is fixed, APreClus recalculates the cluster centers, which are used as representatives for…

Clustering high-dimensional dataFuzzy clusteringTheoretical computer sciencek-medoidsComputer scienceSingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreComplete-linkage clusteringGraphHierarchical clusteringComputingMethodologies_PATTERNRECOGNITIONData stream clusteringCURE data clustering algorithmCanopy clustering algorithmFLAME clusteringAffinity propagationData miningCluster analysiscomputerk-medians clusteringClustering coefficientProceedings of the 29th Annual ACM Symposium on Applied Computing

researchProduct

Making nonlinear manifold learning models interpretable: The manifold grand tour

2015

Smooth nonlinear topographic maps of the data distribution to guide a Grand Tour visualisation.Prioritisation of data linear views that are most consistent with data structure in the maps.Useful visualisations that cannot be obtained by other more classical approaches. Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to…

Clustering high-dimensional dataQA75Nonlinear dimensionality reductionDiscriminative clusteringComputer scienceVisualització de la informaciócomputer.software_genreData visualizationProjection (mathematics)Information visualizationArtificial IntelligenceQA:Informàtica::Infografia [Àrees temàtiques de la UPC]business.industryData visualizationDimensionality reductionGrand tourGeneral EngineeringNonlinear dimensionality reductionTopographic mapData structureComputer Science ApplicationsVisualizationManifold learningData miningbusinesscomputerGenerative topographic mappingLinear projections

researchProduct

The Three Steps of Clustering In The Post-Genomic Era

2013

This chapter descibes the basic algorithmic components that are involved in clustering, with particular attention to classification of microarray data.

Clustering high-dimensional dataSettore INF/01 - Informaticabusiness.industryCorrelation clusteringPattern recognitioncomputer.software_genreBiclusteringCURE data clustering algorithmClustering Classification Biological Data MiningConsensus clusteringArtificial intelligenceData miningbusinessCluster analysiscomputerMathematics

researchProduct

A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data

2013

Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.

Clustering high-dimensional databusiness.industryComputer sciencePattern recognitionInformation theorycomputer.software_genreUncorrelatedDecomposition method (queueing theory)Data miningArtificial intelligencebusinessFeature setcomputerClassifier (UML)Curse of dimensionality

researchProduct

Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm

2003

A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.

Clustering high-dimensional datak-medoidsComputer scienceCURE data clustering algorithmSingle-linkage clusteringCanopy clustering algorithmVariance (accounting)Data miningCluster analysiscomputer.software_genrecomputerk-medians clustering

researchProduct

Bayesian versus data driven model selection for microarray data

2014

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. A…

Clustering Model selection Bayesian information criterion Akaike information criterion Minimum message length BioinformaticsSettore INF/01 - InformaticaComputer sciencebusiness.industryModel selectionBayesian probabilitycomputer.software_genreMachine learningComputer Science ApplicationsData-drivenDetermining the number of clusters in a data setIdentification (information)Bayesian information criterionData miningArtificial intelligenceAkaike information criterionCluster analysisbusinesscomputer

researchProduct

Neural networks with non-uniform embedding and explicit validation phase to assess Granger causality

2015

A challenging problem when studying a dynamical system is to find the interdependencies among its individual components. Several algorithms have been proposed to detect directed dynamical influences between time series. Two of the most used approaches are a model-free one (transfer entropy) and a model-based one (Granger causality). Several pitfalls are related to the presence or absence of assumptions in modeling the relevant features of the data. We tried to overcome those pitfalls using a neural network approach in which a model is built without any a priori assumptions. In this sense this method can be seen as a bridge between model-free and model-based approaches. The experiments perfo…

Cognitive NeuroscienceEntropyFOS: Physical sciencesOverfittingcomputer.software_genreMachine learningGranger causalityArtificial IntelligenceMedicine and Health SciencesEntropy (information theory)Non-uniform embeddingComputer SimulationMathematicsArtificial neural networkbusiness.industryProbability and statisticsModels TheoreticalNeural Networks (Computer)ClassificationNeural networkAlgorithmCausalityPhysics - Data Analysis Statistics and ProbabilitySettore ING-INF/06 - Bioingegneria Elettronica E InformaticaGranger causalityEmbeddingA priori and a posterioriTransfer entropyNeural Networks ComputerArtificial intelligenceData miningbusinesscomputerAlgorithmsNeural networksData Analysis Statistics and Probability (physics.data-an)

researchProduct

Panel Summary: Knowledge Model Representations

1997

Following the usual classifications of cognitive psychologists, we can say that the problem of representation spans three domains: the environment, the brain, and cognitive processes, which are usually studied by different scientists: the physicists, the neurobiologists and the psychologists. With the development of computer science and artificial intelligence new approaches have been introduced, which make possible simulation and implementation of cognitive processes through neural networks and symbolic systems. But the contribution of new methods is not limited to simulation, because they try to provide new models which consider cognitive process as information processing, not as reaction…

Cognitive scienceArtificial neural networkArtificial visionComputer scienceInformation processingRepresentation (systemics)Conceptual spaceCognitionData miningcomputer.software_genrecomputerSymbolic Systems

researchProduct

A framework to identify primitives that represent usability within Model-Driven Development methods

2014

Context: Nowadays, there are sound methods and tools which implement the Model-Driven Development approach (MDD) satisfactorily. However, MDD approaches focus on representing and generating code that represents functionality, behaviour and persistence, putting the interaction, and more specifically the usability, in a second place. If we aim to include usability features in a system developed with a MDD tool, we need to extend manually the generated code. Objective: This paper tackles how to include functional usability features (usability recommendations strongly related to system functionality) in MDD through conceptual primitives. Method: The approach consists of studying usability guide…

Cognitive walkthroughPluralistic walkthroughComputer scienceUsabilityUsability inspectionBIBLIOTECONOMIA Y DOCUMENTACION02 engineering and technologycomputer.software_genreHuman–computer interactionSoftware_SOFTWAREENGINEERING020204 information systemsHeuristic evaluationUsability engineering0202 electrical engineering electronic engineering information engineeringWeb usabilityInformáticaModel-Driven Developmentbusiness.industry020207 software engineeringUsabilityComputer Science ApplicationsUsability goalsConceptual modelData miningbusinesscomputerLENGUAJES Y SISTEMAS INFORMATICOSSoftwareInformation Systems

researchProduct