0000000000923303
AUTHOR
Padraig Cunningham
Sequential Genetic Search for Ensemble Feature Selection
Ensemble learning constitutes one of the main directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. One technique, which proved to be effective for constructing an ensemble of diverse classifiers, is the use of feature subsets. Among different approaches to ensemble feature selection, genetic search was shown to perform best in many domains. In this paper, a new strategy GAS-SEFS, Genetic Algorithm-based Sequential Search for Ensemble Feature Selection, is introduced. Instead of one genetic process, it employs a series of processes, the goal of each of which is to build one base classifier. Experiment…
Dynamic Integration with Random Forests
Random Forests are a successful ensemble prediction technique that combines two sources of randomness to generate base decision trees; bootstrapping instances for each tree and considering a random subset of features at each node. Breiman in his introductory paper on Random Forests claims that they are more robust than boosting with respect to overfitting noise, and are able to compete with boosting in terms of predictive performance. Multiple recently published empirical studies conducted in various application domains confirm these claims. Random Forests use simple majority voting to combine the predictions of the trees. However, it is clear that each decision tree in a random forest may …
Dynamic Integration of Classifiers for Tracking Concept Drift in Antibiotic Resistance Data
In the real world concepts are often not stable but change with time. A typical example of this in the medical context is antibiotic resistance, where pathogen sensitivity may change over time as new pathogen strains develop resistance to antibiotics which were previously effective. This problem, known as concept drift, complicates the task of learning a model from medical data and requires special approaches, different from commonly used techniques, which treat arriving instances as equally important contributors to the final concept. The underlying data distribution may change as well, making previously built models useless, which is known as virtual concept drift. These changes make regu…
Diversity in Ensemble Feature Selection
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. It was shown theoretically and experimentally that in order for an ensemble to be effective, it should consist of high-accuracy base classifiers that should have high diversity in their predictions. One technique, which proved to be effective for constructing an ensemble of accurate and diverse base classifiers, is to use different feature subsets, or so-called ensemble feature selection. Many ensemble feature selection strategies incorporate diversity as a component of the fitness funct…