Search results for "Single-linkage clustering"

showing 7 items of 17 documents

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

2017

Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…

Fuzzy clusteringlcsh:T55.4-60.8Computer scienceSingle-linkage clusteringCorrelation clustering02 engineering and technologycomputer.software_genrelcsh:QA75.5-76.95Theoretical Computer Scienceprototype-based clusteringCURE data clustering algorithm020204 information systemsprototype-based clustering; clustering validation index; robust statisticsConsensus clusteringalgoritmit0202 electrical engineering electronic engineering information engineeringlcsh:Industrial engineering. Management engineeringCluster analysisk-medians clusteringta113Numerical Analysisbusiness.industryPattern recognitionDetermining the number of clusters in a data setComputational MathematicsComputingMethodologies_PATTERNRECOGNITIONComputational Theory and Mathematicsrobust statistics020201 artificial intelligence & image processinglcsh:Electronic computers. Computer scienceArtificial intelligenceData miningtiedonlouhintabusinessclustering validation indexcomputerAlgorithms

researchProduct

Growing Hierarchical Self-organizing Maps and Statistical Distribution Models for Online Detection of Web Attacks

2013

In modern networks, HTTP clients communicate with web servers using request messages. By manipulating these messages attackers can collect confidential information from servers or even corrupt them. In this study, the approach based on anomaly detection is considered to find such attacks. For HTTP queries, feature matrices are obtained by applying an n-gram model, and, by learning on the basis of these matrices, growing hierarchical self-organizing maps are constructed. For HTTP headers, we employ statistical distribution models based on the lengths of header values and relative frequency of symbols. New requests received by the web-server are classified by using the maps and models obtaine…

Self-organizing mapWeb serverComputer scienceServerHeaderSingle-linkage clusteringAnomaly detectionIntrusion detection systemData miningWeb servicecomputer.software_genrecomputer

researchProduct

Image Segmentation through a Hierarchy of Minimum Spanning Trees

2012

Many approaches have been adopted to solve the problem of image segmentation. Among them a noticeable part is based on graph theory casting the pixels as nodes in a graph. This paper proposes an algorithm to select clusters in the images (corresponding to relevant segments in the image) corresponding to the areas induced in the images through the search of the Minimum Spanning Tree (MST). In particular is is based on a clustering algorithm that extracts clusters computing a hierarchy of Minimum Spanning Trees. The main drawback of this previous algorithm is that the dimension of the cluster is not predictable and a relevant portion of found clusters can be composed by micro-clusters that ar…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSpanning treebusiness.industrySingle-linkage clusteringComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONPattern recognitionImage segmentationMinimum spanning treeImage SegmentationMinimum Spanning TreesClusteringDistributed minimum spanning treeMinimum spanning tree-based segmentationKruskal's algorithmArtificial IntelligenceComputer Science::Computer Vision and Pattern RecognitionReverse-delete algorithmArtificial intelligencebusinessMathematics

researchProduct

An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering

2002

Abstract A generalized prototype-based classification scheme founded on hierarchical clustering is proposed. The basic idea is to obtain a condensed 1-NN classification rule by merging the two same-class nearest clusters, provided that the set of cluster representatives correctly classifies all the original points. Apart from the quality of the obtained sets and its flexibility which comes from the fact that different intercluster measures and criteria can be used, the proposed scheme includes a very efficient four-stage procedure which conveniently exploits geometric cluster properties to decide about each possible merge. Empirical results demonstrate the merits of the proposed algorithm t…

Single-linkage clusteringcomputer.software_genreComplete-linkage clusteringHierarchical clusteringk-nearest neighbors algorithmArtificial IntelligenceNearest-neighbor chain algorithmClassification ruleSignal ProcessingCluster (physics)Computer Vision and Pattern RecognitionData miningMerge (version control)computerSoftwareMathematicsPattern Recognition

researchProduct

SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm

2014

Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The k…

sparse matrixClustering high-dimensional dataTheoretical computer scienceonline algorithmsComputer scienceSingle-linkage clusteringComplete-linkage clusteringNearest-neighbor chain algorithmConsensus clusteringmemory-efficient clusteringCluster analysisk-medians clusteringGeneral Environmental ScienceSparse matrix:Engineering::Computer science and engineering [DRNTU]k-medoidsDendrogramConstrained clusteringHierarchical clusteringDistance matrixCanopy clustering algorithmGeneral Earth and Planetary SciencesFLAME clusteringHierarchical clustering of networkshierarchical clusteringAlgorithmProcedia Computer Science

researchProduct

Twister Tries

2015

Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…

ta113Hierarchical agglomerative clusteringta112Fuzzy clusteringBrown clusteringComputer scienceSingle-linkage clusteringcomputer.software_genreHierarchical clusteringLocality-sensitive hashingData setCURE data clustering algorithmlocality-sensitive hashingaverage linkageData miningHierarchical clustering of networkslinear complexityCluster analysishierarchical clusteringAlgorithmcomputerTime complexityProceedings of the 2015 ACM SIGMOD International Conference on Management of Data

researchProduct

Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination

2015

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algor…

ta113Theoretical computer scienceBrown clusteringComputer scienceCorrelation clusteringSingle-linkage clusteringHierarchical clusteringCURE data clustering algorithmhierrchial clusteringCanopy clustering algorithmHierarchical clustering of networksCluster analysisclustering2015 IEEE Symposium Series on Computational Intelligence

researchProduct