6533b854fe1ef96bd12ae7f5

RESEARCH PRODUCT

Restricted Decontamination for the Imbalanced Training Sample Problem

E. RangelFrancesc J. FerriJosé Salvador SánchezRicardo Barandela

subject

Weight functionTraining setPoint (typography)business.industryComputer scienceSupervised learningSample (statistics)Function (mathematics)Machine learningcomputer.software_genreSpeech processingClass (biology)Pattern recognition (psychology)Artificial intelligencebusinesscomputer

description

The problem of imbalanced training data in supervised methods is currently receiving growing attention. Imbalanced data means that one class is much more represented than the others in the training sample. It has been observed that this situation, which arises in several practical domains, may produce an important deterioration of the classification accuracy, in particular with patterns belonging to the less represented classes. In the present paper, we report experimental results that point at the convenience of correctly downsizing the majority class while simultaneously increasing the size of the minority one in order to balance both classes. This is obtained by applying a modification of the previously proposed Decontamination methodology. Combination of this proposal with the employment of a weighted distance function is also explored.

https://doi.org/10.1007/978-3-540-24586-5_52