0000000000303154

AUTHOR

Iryna Skrypnyk

Irrelevant Features, Class Separability, and Complexity of Classification Problems

In this paper, analysis of class separability measures is performed in attempt to relate their descriptive abilities to geometrical properties of classification problems in presence of irrelevant features. The study is performed on synthetic and benchmark data with known irrelevant features and other characteristics of interest, such as class boundaries, shapes, margins between classes, and density. The results have shown that some measures are individually informative, while others are less reliable and only can provide complimentary information. Classification problem complexity measurements on selected data sets are made to gain additional insights on the obtained results.

research product

Correlation-Based and Contextual Merit-Based Ensemble Feature Selection

Recent research has proved the benefits of using an ensemble of diverse and accurate base classifiers for classification problems. In this paper the focus is on producing diverse ensembles with the aid of three feature selection heuristics based on two approaches: correlation and contextual merit -based ones. We have developed an algorithm and experimented with it to evaluate and compare the three feature selection heuristics on ten data sets from UCI Repository. On average, simple correlation-based ensemble has the superiority in accuracy. The contextual merit -based heuristics seem to include too many features in the initial ensembles and iterations were most successful with it.

research product

Ensemble Feature Selection Based on Contextual Merit and Correlation Heuristics

Recent research has proven the benefits of using ensembles of classifiers for classification problems. Ensembles of diverse and accurate base classifiers are constructed by machine learning methods manipulating the training sets. One way to manipulate the training set is to use feature selection heuristics generating the base classifiers. In this paper we examine two of them: correlation-based and contextual merit -based heuristics. Both rely on quite similar assumptions concerning heterogeneous classification problems. Experiments are considered on several data sets from UCI Repository. We construct fixed number of base classifiers over selected feature subsets and refine the ensemble iter…

research product

Unstable feature relevance in classification tasks

research product

Ensemble Feature Selection Based on the Contextual Merit

Recent research has proved the benefits of using ensembles of classifiers for classification problems. Ensembles constructed by machine learning methods manipulating the training set are used to create diverse sets of accurate classifiers. Different feature selection techniques based on applying different heuristics for generating base classifiers can be adjusted to specific domain characteristics. In this paper we consider and experiment with the contextual feature merit measure as a feature selection heuristic. We use the diversity of an ensemble as evaluation function in our new algorithm with a refinement cycle. We have evaluated our algorithm on seven data sets from UCI. The experiment…

research product