Search results for "Correlation clustering"

showing 8 items of 28 documents

Solution Using Clustering Methods

1987

The main aim of this analysis is to find out typical morphologies from the multivariate and longitudinal data set on growing children and to describe the morphological evolution of the found groups of girls. The finding out of typical morphologies is, in our opinion, strictly linked to the search of structures in the individuals and in the variables.

Set (abstract data type)BiclusteringMultivariate statisticsComputer scienceCURE data clustering algorithmbusiness.industryLongitudinal dataConsensus clusteringCorrelation clusteringPattern recognitionArtificial intelligencebusinessCluster analysis
researchProduct

Robust refinement of initial prototypes for partitioning-based clustering algorithms

2007

Non-uniqueness of solutions and sensitivity to erroneous data are common problems to large-scale data clustering tasks. In order to avoid poor quality of solutions with partitioning-based clustering methods, robust estimates (that are highly insensitive to erroneous data values) are needed and initial cluster prototypes should be determined properly. In this paper, a robust density estimation initialization method that exploits the spatial median estimate to the prototype update is presented. Besides being insensitive to noise and outliers, the new method is also computationally comparable with other traditional methods. The methods are compared by numerical experiments on a set of syntheti…

Set (abstract data type)Computer scienceCorrelation clusteringOutlierInitializationSensitivity (control systems)Density estimationNoise (video)Data miningCluster analysiscomputer.software_genrecomputerRecent Advances in Stochastic Modeling and Data Analysis
researchProduct

Robust Synchronization-Based Graph Clustering

2013

Complex graph data now arises in various fields like social networks, protein-protein interaction networks, ecosystems, etc. To reveal the underlying patterns in graphs, an important task is to partition them into several meaningful clusters. The question is: how can we find the natural partitions of a complex graph which truly reflect the intrinsic patterns? In this paper, we propose RSGC, a novel approach to graph clustering. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. For each vertex, it is viewed as an oscillator and interacts with other vertices according to the graph connection information. During the process towards synchro…

Theoretical computer scienceComputer scienceCURE data clustering algorithmKuramoto modelCorrelation clusteringCluster analysisPartition (database)SynchronizationMathematicsofComputing_DISCRETEMATHEMATICSClustering coefficientVertex (geometry)
researchProduct

Part-of-Speech Induction by Singular Value Decomposition and Hierarchical Clustering

2006

Part-of-speech induction involves the automatic discovery of word classes and the assignment of each word of a vocabulary to one or several of these classes. The approach proposed here is based on the analysis of word distributions in a large collection of German newspaper texts. Its main advantage over other attempts is that it combines the hierarchical clustering of context vectors with a previous step of dimensionality reduction that minimizes the effects of sampling errors.

VocabularyK-SVDComputer sciencebusiness.industrymedia_common.quotation_subjectDimensionality reductionCorrelation clusteringPattern recognitionContext (language use)Hierarchical clusteringSingular value decompositionArtificial intelligencebusinessWord (computer architecture)media_common
researchProduct

CLUSTERING INCOMPLETE SPECTRAL DATA WITH ROBUST METHODS

2018

Abstract. Missing value imputation is a common approach for preprocessing incomplete data sets. In case of data clustering, imputation methods may cause unexpected bias because they may change the underlying structure of the data. In order to avoid prior imputation of missing values the computational operations must be projected on the available data values. In this paper, we apply a robust nan-K-spatmed algorithm to the clustering problem on hyperspectral image data. Robust statistics, such as multivariate medians, are more insensitive to outliers than classical statistics relying on the Gaussian assumptions. They are, however, computationally more intractable due to the lack of closed-for…

lcsh:Applied optics. PhotonicsMultivariate statisticsComputer scienceGaussianCorrelation clusteringRobust statisticsspectral datacomputer.software_genrelcsh:Technologysymbols.namesakeCURE data clustering algorithmImputation (statistics)interpolointiCluster analysisK-meansnan-K-spatmedlcsh:Tk-means clusteringlcsh:TA1501-1820robust statistical methodsMissing dataData setlcsh:TA1-2040OutliersymbolsData mininglcsh:Engineering (General). Civil engineering (General)computerclustering
researchProduct

A novel heuristic memetic clustering algorithm

2013

In this paper we introduce a novel clustering algorithm based on the Memetic Algorithm meta-heuristic wherein clusters are iteratively evolved using a novel single operator employing a combination of heuristics. Several heuristics are described and employed for the three types of selections used in the operator. The algorithm was exhaustively tested on three benchmark problems and compared to a classical clustering algorithm (k-Medoids) using the same performance metrics. The results show that our clustering algorithm consistently provides better clustering solutions with less computational effort.

ta113Determining the number of clusters in a data setBiclusteringClustering high-dimensional dataDBSCANComputingMethodologies_PATTERNRECOGNITIONTheoretical computer scienceCURE data clustering algorithmCorrelation clusteringCanopy clustering algorithmCluster analysisAlgorithmMathematics2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
researchProduct

Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination

2015

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algor…

ta113Theoretical computer scienceBrown clusteringComputer scienceCorrelation clusteringSingle-linkage clusteringHierarchical clusteringCURE data clustering algorithmhierrchial clusteringCanopy clustering algorithmHierarchical clustering of networksCluster analysisclustering2015 IEEE Symposium Series on Computational Intelligence
researchProduct

Scalable implementation of dependence clustering in Apache Spark

2017

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed

ta113ta213Apache SparkComputer sciencedatasetsCorrelation clusteringdata miningcomputer.software_genrealgorithmsSpectral clusteringComputational sciencedependence clusteringData stream clusteringCURE data clustering algorithmScalabilitySpark (mathematics)algoritmitCanopy clustering algorithmData miningtiedonlouhintaCluster analysisclustering algorithmscomputerdata processingtietojenkäsittely
researchProduct