6533b7d2fe1ef96bd125df03
RESEARCH PRODUCT
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions
Andreas KarwathMichael GeilkeStefan Kramersubject
Data streamMahalanobis distanceComputer scienceData stream miningbusiness.industry02 engineering and technologyDensity estimationcomputer.software_genreSet (abstract data type)Software020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingData miningbusinesscomputerCurse of dimensionalityVector spacedescription
The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the estimates, it enables the density estimation of higher-dimensional data and approaches the true density with increasing dimensionality of the vector space. Moreover, it is not only designed to estimate homogeneous data, i.e., where all variables are nominal or all variables are numeric, but it can also estimate heterogeneous data. The evaluation is conducted on synthetic and real-world data. The software related to this paper is available at https://github.com/geilke/mideo.
year | journal | country | edition | language |
---|---|---|---|---|
2016-01-01 |