6533b82afe1ef96bd128b6c6

RESEARCH PRODUCT

On the classification of dynamical data streams using novel “Anti-Bayesian” techniques

B. John OommenB. John OommenHugo Lewi HammerAnis Yazidi

subject

Dynamical systems theoryData stream miningComputer scienceBayesian probabilityEstimator02 engineering and technologycomputer.software_genreSynthetic dataArtificial IntelligenceRobustness (computer science)020204 information systemsSignal ProcessingOutlier0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionData miningBayesian paradigmAlgorithmcomputerSoftwareQuantile

description

Abstract The classification of dynamical data streams is among the most complex problems encountered in classification. This is, firstly, because the distribution of the data streams is non-stationary, and it changes without any prior “warning”. Secondly, the manner in which it changes is also unknown. Thirdly, and more interestingly, the model operates with the assumption that the correct classes of previously-classified patterns become available at a juncture after their appearance. This paper pioneers the use of unreported novel schemes that can classify such dynamical data streams by invoking the recently-introduced “Anti-Bayesian” (AB) techniques. Contrary to the Bayesian paradigm, that compare the testing sample with the distribution’s central points, AB techniques are based on the information in the distant-from-the-mean samples. Most Bayesian approaches can be naturally extended to dynamical systems by dynamically tracking the mean of each class using, for example, the exponential moving average based estimator, or a sliding window estimator. The AB schemes introduced by Oommen et al.., on the other hand, work with a radically different approach and with the non-central quantiles of the distributions. Surprisingly and counter-intuitively, the reported AB methods work equally or close-to-equally well to an optimal supervised Bayesian scheme on a host of accepted Pattern Recognition problems. This thus begs its natural extension to the unexplored arena of classification for dynamical data streams. Naturally, for such an AB classification approach, we need to track the non-stationarity of the quantiles of the classes. To achieve this, in this paper, we develop an AB approach for the online classification of data streams by applying the efficient and robust quantile estimators developed by Yazidi and Hammer [12,37]. Apart from the methodology itself, in this paper, we compare the Bayesian and AB approaches using both real-life and synthetic data. The results demonstrate the intriguing and counter-intuitive results that the AB approach, sometimes, actually outperforms the Bayesian approach for this application both with respect to the peak performance obtained, and the robustness of the choice of the respective tuning parameters. Furthermore, the AB approach is much more robust against outliers, which is an inherent property of quantile estimators [12,37], which is a property that the Bayesian approach cannot match, since it rather tracks the mean.

https://doi.org/10.1016/j.patcog.2017.10.031