6533b7dcfe1ef96bd1273490

RESEARCH PRODUCT

Linear density-based clustering with a discrete density model

Roberto PirroneVincenzo CannellaSergio MonteleoneGabriella Giordano

subject

FOS: Computer and information sciencesComputer Science - Machine LearningH.3.3Statistics - Machine LearningI.5.362H30 68T10I.5.3; H.3.3Machine Learning (stat.ML)Machine Learning (cs.LG)

description

Density-based clustering techniques are used in a wide range of data mining applications. One of their most attractive features con- sists in not making use of prior knowledge of the number of clusters that a dataset contains along with their shape. In this paper we propose a new algorithm named Linear DBSCAN (Lin-DBSCAN), a simple approach to clustering inspired by the density model introduced with the well known algorithm DBSCAN. Designed to minimize the computational cost of density based clustering on geospatial data, Lin-DBSCAN features a linear time complexity that makes it suitable for real-time applications on low-resource devices. Lin-DBSCAN uses a discrete version of the density model of DBSCAN that takes ad- vantage of a grid-based scan and merge approach. The name of the algorithm stems exactly from its main features outlined above. The algorithm was tested with well known data sets. Experimental results prove the efficiency and the validity of this approach over DBSCAN in the context of spatial data clustering, enabling the use of a density-based clustering technique on large datasets with low computational cost.

https://dx.doi.org/10.48550/arxiv.1807.08158