6533b852fe1ef96bd12aa88d

RESEARCH PRODUCT

Dynamic Integration with Random Forests

Alexey TsymbalPadraig Cunningham

subject

Computer Science

description

Random Forests are a successful ensemble prediction technique that combines two sources of randomness to generate base decision trees; bootstrapping instances for each tree and considering a random subset of features at each node. Breiman in his introductory paper on Random Forests claims that they are more robust than boosting with respect to overfitting noise, and are able to compete with boosting in terms of predictive performance. Multiple recently published empirical studies conducted in various application domains confirm these claims. Random Forests use simple majority voting to combine the predictions of the trees. However, it is clear that each decision tree in a random forest may have different contribution in classifying a certain instance. In this paper, we demonstrate that the prediction performance of Random Forests may still be improved in some domains by replacing the combination function. Dynamic integration, which is based on local performance estimates of base predictors, can be used instead of majority voting. We conduct experiments on a selection of classification datasets, analysing the resulting accuracy, the margin and the bias and variance components of error. The experiments demonstrate that dynamic integration increases accuracy on some datasets. Even if the accuracy remains the same, dynamic integration always increases the margin. A bias/variance decomposition demonstrates that dynamic integration decreases the error by significantly decreasing the bias component while leaving the same or insignificantly increasing the variance. The experiments also demonstrate that the intrinsic similarity measure of Random Forests is better than the commonly used Heterogeneous Euclidean/Overlap Metric in finding a neighbourhood for local estimates in this context.

http://hdl.handle.net/2262/13500