Ensemble feature selection with the simple Bayesian classification

6533b857fe1ef96bd12b4cff

RESEARCH PRODUCT

Ensemble feature selection with the simple Bayesian classification

David W. Patterson Alexey Tsymbal Seppo Puuronen

subject

business.industry Bayesian probability Feature selection Pattern recognition Machine learning computer.software_genre Linear subspace Random subspace method Naive Bayes classifier Bayes' theorem ComputingMethodologies_PATTERNRECOGNITION Hardware and Architecture Signal Processing Artificial intelligence business computer Classifier (UML)Software Cascading classifiers Information Systems Mathematics

description

Abstract A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and diverse simple Bayesian classifiers is to use different feature subsets generated with the random subspace method. In this case, the ensemble consists of multiple classifiers constructed by randomly selecting feature subsets, that is, classifiers constructed in randomly chosen subspaces. In this paper, we present an algorithm for building ensembles of simple Bayesian classifiers in random subspaces. The EFS_SBC algorithm includes a hill-climbing-based refinement cycle, which tries to improve the accuracy and diversity of the base classifiers built on random feature subsets. We conduct a number of experiments on a collection of 21 real-world and synthetic data sets, comparing the EFS_SBC ensembles with the single simple Bayes, and with the boosted simple Bayes. In many cases the EFS_SBC ensembles have higher accuracy than the single simple Bayesian classifier, and than the boosted Bayesian ensemble. We find that the ensembles produced focusing on diversity have lower generalization error, and that the degree of importance of diversity in building the ensembles is different for different data sets. We propose several methods for the integration of simple Bayesian classifiers in the ensembles. In a number of cases the techniques for dynamic integration of classifiers have significantly better classification accuracy than their simple static analogues. We suggest that a reason for that is that the dynamic integration better utilizes the ensemble coverage than the static integration.

year	journal	country	edition	language
2003-06-01	Information Fusion

https://doi.org/10.1016/s1566-2535(03)00004-6