6533b83afe1ef96bd12a77d2

RESEARCH PRODUCT

<title>Dynamic integration of multiple data mining techniques in a knowledge discovery management system</title>

Alexey TsymbalArtyom KatasonovVagan TerziyanSeppo Puuronen

subject

Computer sciencebusiness.industryWeighted votingcomputer.software_genreMachine learningExpert systemMultiple dataMatrix (mathematics)Information extractionComputingMethodologies_PATTERNRECOGNITIONKnowledge extractionManagement systemData miningArtificial intelligencebusinesscomputerClassifier (UML)

description

One of the most important directions in improvement of data mining and knowledge discovery, is the integration of multiple classification techniques of an ensemble of classifiers. An integration technique should be able to estimate and select the most appropriate component classifiers from the ensemble. We present two variations of an advanced dynamic integration technique with two distance metrics. The technique is one variation of the stacked generalization method, with an assumption that each of the component classifiers is the best one, inside a certain sub area of the entire domain area. Our technique includes two phases: the learning phase and the application phase. During the learning phase, a performance matrix of each component classifier is derived, using the instances of the training set. Each matrix thus includes a way information concerning the 'competence area' of the corresponding component classifier. These matrixes are used during the application phase to predict the performance of each component classifier in each new instance. The technique is evaluated on three data sets, taken from the UCI machine learning repository, with which well-known classification methods have not proved successful. The comparison results show that our dynamic integration technique outperforms weighted voting and cross-validation majority techniques in some datasets.

https://doi.org/10.1117/12.339975