0000000000449039

AUTHOR

Anton Nikulin

Ignorance-Aware Approaches and Algorithms for Prototype Selection in Machine Learning

Operating with ignorance is an important concern of the Machine Learning research, especially when the objective is to discover knowledge from the imperfect data. Data mining (driven by appropriate knowledge discovery tools) is about processing available (observed, known and understood) samples of data aiming to build a model (e.g., a classifier) to handle data samples, which are not yet observed, known or understood. These tools traditionally take samples of the available data (known facts) as an input for learning. We want to challenge the indispensability of this approach and we suggest considering the things the other way around. What if the task would be as follows: how to learn a mode…

research product

Smart prototype selection for machine learning based on ignorance zones analysis

The size of databases has been considerably growing over recent decades and Machine Learning algorithms are not ready to process such large volume of information. Being one of the most useful algorithms in Data Mining the Nearest neighbor classifier suffers from high storage requirements and slow response when working with large data sets. Prototype Selection methods help to alleviate this problem by choosing a subset of data with a smaller size. In this thesis, the overview of existing instance selection methods is provided together with the introduction of a new approach. The majority of current methods select a subset experimentally by checking whether certain point affects classificatio…

research product

Semantics of Voids within Data: Ignorance-Aware Machine Learning

Operating with ignorance is an important concern of geographical information science when the objective is to discover knowledge from the imperfect spatial data. Data mining (driven by knowledge discovery tools) is about processing available (observed, known, and understood) samples of data aiming to build a model (e.g., a classifier) to handle data samples that are not yet observed, known, or understood. These tools traditionally take semantically labeled samples of the available data (known facts) as an input for learning. We want to challenge the indispensability of this approach, and we suggest considering the things the other way around. What if the task would be as follows: how to buil…

research product