6533b7d5fe1ef96bd12645b9

RESEARCH PRODUCT

Forest of Normalized Trees: Fast and Accurate Density Estimation of Streaming Data

Stefan KramerZahra AhmadiPatrick Rehn

subject

Data streamComputer scienceData stream miningFeature vectorEstimator02 engineering and technologyDensity estimation01 natural sciencesData modeling010104 statistics & probabilityKernel (statistics)0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0101 mathematicsRandom variableAlgorithm

description

Density estimation of streaming data is a relevant task in numerous domains. In this paper, a novel non-parametric density estimator called FRONT (forest of normalized trees) is introduced. It uses a structure of multiple normalized trees, segments the feature space of the data stream through a periodically updated linear transformation and is able to adapt to ever evolving data streams. FRONT provides accurate density estimation and performs favorably compared to existing online density estimators in terms of the average log score on multiple standard data sets. Its low complexity, linear runtime as well as constant memory usage, makes FRONT by design suitable for large data streams. Finally, the paper provides a variation of FRONT called N-FRONT suitable for statistically independent data streams and correction methods for badly initialized trees to further improve performance.

https://doi.org/10.1109/dsaa.2018.00030