A comparison between two feature selection algorithms
This article provides a comparison of two feature selection algorithms, Information Gain Thresholding and Koller and Sahami's algorithm in the context of text document classification on the Reuters Corpus Volume 1 dataset. The algorithms were evaluated by testing the performance of classifiers trained on the features they select from a given dataset. Results show that Koller and Sahami's algorithm consistently outperforms Information Gain Thresholding by capturing interactions between features and avoiding redundancy among features, although it achieves its gains through increased complexity and longer running time.
Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the G statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as…