6533b82ffe1ef96bd12951d2

RESEARCH PRODUCT

Scalable multiscale density estimation

Ye WangAntonio CanaleDavid Dunson

subject

Methodology (stat.ME)FOS: Computer and information sciencesComputingMethodologies_PATTERNRECOGNITIONStatistics - Methodology

description

Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of dimensionality, it is necessary to assume the data are concentrated near a lower-dimensional subspace. However, Bayesian methods for learning this subspace along with the density of the data scale poorly computationally. To solve this problem, we propose an empirical Bayes approach, which estimates a multiscale dictionary using geometric multiresolution analysis in a first stage. We use this dictionary within a multiscale mixture model, which allows uncertainty in component allocation, mixture weights and scaling factors over a binary tree. A computational algorithm is proposed, which scales efficiently to massive dimensional problems. We provide some theoretical support for this geometric density estimation (GEODE) method, and illustrate the performance through simulated and real data examples.

http://arxiv.org/abs/1410.7692