Search results for "Canopy clustering algorithm"

showing 2 items of 12 documents

Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination

2015

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algor…

ta113Theoretical computer scienceBrown clusteringComputer scienceCorrelation clusteringSingle-linkage clusteringHierarchical clusteringCURE data clustering algorithmhierrchial clusteringCanopy clustering algorithmHierarchical clustering of networksCluster analysisclustering2015 IEEE Symposium Series on Computational Intelligence
researchProduct

Scalable implementation of dependence clustering in Apache Spark

2017

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed

ta113ta213Apache SparkComputer sciencedatasetsCorrelation clusteringdata miningcomputer.software_genrealgorithmsSpectral clusteringComputational sciencedependence clusteringData stream clusteringCURE data clustering algorithmScalabilitySpark (mathematics)algoritmitCanopy clustering algorithmData miningtiedonlouhintaCluster analysisclustering algorithmscomputerdata processingtietojenkäsittely
researchProduct