6533b858fe1ef96bd12b6444

RESEARCH PRODUCT

Twister Tries

Hao MouMichael Cochez

subject

ta113Hierarchical agglomerative clusteringta112Fuzzy clusteringBrown clusteringComputer scienceSingle-linkage clusteringcomputer.software_genreHierarchical clusteringLocality-sensitive hashingData setCURE data clustering algorithmlocality-sensitive hashingaverage linkageData miningHierarchical clustering of networkslinear complexityCluster analysishierarchical clusteringAlgorithmcomputerTime complexity

description

Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only linear space. Furthermore, its time complexity is linear in the number of items to be clustered, making it feasible to apply it on a larger scale. We evaluate the approach both analytically and by applying it to several data sets. peerReviewed

https://doi.org/10.1145/2723372.2751521