6533b81ffe1ef96bd1277ef6

RESEARCH PRODUCT

Smart prototype selection for machine learning based on ignorance zones analysis

Anton Nikulin

subject

prototyypitkoneoppiminenData reductionIgnorance zonesPrototype selectionClassificationNearest neighbor

description

The size of databases has been considerably growing over recent decades and Machine Learning algorithms are not ready to process such large volume of information. Being one of the most useful algorithms in Data Mining the Nearest neighbor classifier suffers from high storage requirements and slow response when working with large data sets. Prototype Selection methods help to alleviate this problem by choosing a subset of data with a smaller size. In this thesis, the overview of existing instance selection methods is provided together with the introduction of a new approach. The majority of current methods select a subset experimentally by checking whether certain point affects classification accuracy or not. The new approach, presented in this thesis, is based on analyzing data set instances and choosing prototypes based on discovered ignorance zones. The results obtained from the analysis show that the proposed method can effectively decrease the size of the data set while maintaining the same classification accuracy with the Nearest neighbor classifier. In addition, it allows removing noisy data making the decision boundaries smoother.

http://urn.fi/URN:NBN:fi:jyu-201803281873