6533b7d5fe1ef96bd12645b9
RESEARCH PRODUCT
Forest of Normalized Trees: Fast and Accurate Density Estimation of Streaming Data
Stefan KramerZahra AhmadiPatrick Rehnsubject
Data streamComputer scienceData stream miningFeature vectorEstimator02 engineering and technologyDensity estimation01 natural sciencesData modeling010104 statistics & probabilityKernel (statistics)0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0101 mathematicsRandom variableAlgorithmdescription
Density estimation of streaming data is a relevant task in numerous domains. In this paper, a novel non-parametric density estimator called FRONT (forest of normalized trees) is introduced. It uses a structure of multiple normalized trees, segments the feature space of the data stream through a periodically updated linear transformation and is able to adapt to ever evolving data streams. FRONT provides accurate density estimation and performs favorably compared to existing online density estimators in terms of the average log score on multiple standard data sets. Its low complexity, linear runtime as well as constant memory usage, makes FRONT by design suitable for large data streams. Finally, the paper provides a variation of FRONT called N-FRONT suitable for statistically independent data streams and correction methods for badly initialized trees to further improve performance.
year | journal | country | edition | language |
---|---|---|---|---|
2018-10-01 | 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) |