0000000001277022

AUTHOR

M. Pechenizkiy

Dynamic integration with random forests

Random Forests (RF) are a successful ensemble prediction technique that uses majority voting or averaging as a combination function. However, it is clear that each tree in a random forest may have a different contribution in processing a certain instance. In this paper, we demonstrate that the prediction performance of RF may still be improved in some domains by replacing the combination function with dynamic integration, which is based on local performance estimates. Our experiments also demonstrate that the RF Intrinsic Similarity is better than the commonly used Heterogeneous Euclidean/Overlap Metric in finding a neighbourhood for local estimates in the context of dynamic integration of …

research product

Tailoring of Feedback in Web-Based Learning: The Role of Response Certitude in the Assessment

This paper analyzes the challenges of tailoring feedback to the student’s response certitude during the assessment in Web-based Learning systems (WBLSs). We present the summary of the results of a series of experiments related to the online assessment of students through multiple-choice quizzes, where students had to select the confidence level and were able to request different kinds of feedback for each of the answered questions.

research product

Diversity in random subspacing ensembles

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…

research product

The impact of feature extraction on the performance of a classifier : kNN, Naïve Bayes and C4.5

"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity and the classification error in high dimensions. In this paper, different feature extraction techniques as means of (1) dimensionality reduction, and (2) constructive induction are analyzed with respect to the performance of a classifier. Three commonly used classifiers are taken for the analysis: kNN, Naïve Bayes and C4.5 decision tree. One of the main goals of this paper is to show the importance of the use of class information in feature extraction for classification and (in)appropriateness of random projection or conventional PCA to feature extraction for …

research product

Online mass flow prediction in CFB boilers

Fuel feeding and inhomogeneity of fuel typically cause process fluctuations in the circulating fluidized bed (CFB) process. If control systems fail to compensate for the fluctuations, the whole plant will suffer from fluctuations that are reinforced by the closed-loop controls. This phenomenon causes a reduction of efficiency and lifetime of process components. Therefore, domain experts are interested in developing tools and techniques for getting better understanding of underlying processes and their mutual dependencies in CFB boilers. In this paper we consider an application of data mining technology to the analysis of time series data from a pilot CFB reactor. Namely, we present a rather…

research product

Does relevance matter to data mining research?

Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it. We review several existing frameworks for DM research that originate from different paradigms. These DM frameworks mainly address various DM algorithms for the different steps of the DM process. Recent research has shown that many real-world problems require integration of several DM algorithms from different paradigms in order to produce a better solution elevating the importance of practice-oriented aspects also in DM research. In this chapter we strongly emphasize that DM research should also take into account the relevance of research, not only the rigor of it. Und…

research product

Knowledge Discovery from Microbiology Data: Many-Sided Analysis of Antibiotic Resistance in Nosocomial Infections

Nosocomial infections and antimicrobial resistance (AR) are highly important problems that impact the morbidity and mortality of hospitalized patients as well as their cost of care. The goal of this paper is to demonstrate our analysis of AR by applying a number of various data mining (DM) techniques to real hospital data. The data for the analysis includes instances of sensitivity of nosocomial infections to antibiotics collected in a hospital over three years 2002-2004. The results of our study show that DM makes it easy for experts to inspect patterns that might otherwise be missed by usual (manual) infection control. However, the clinical relevance and utility of these findings await th…

research product

Feature extraction for classification in knowledge discovery systems

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". We consider three different eigenvector-based feature extraction approaches for classification. The summary of obtained results concerning the accuracy of classification schemes is presented and the issue of search for the most appropriate feature extraction method for a given data set is considered. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the d…

research product

Dynamic integration of classifiers in the space of principal components

Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the integration procedure in the ensemble should properly utilize the ensemble diversity. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique of dynamic integration, in which local accuracy estimates are calculated for each base classifier of an ensemble, in the neighborhood of a new instance to be pr…

research product

Tailoring feedback in online assessment: Influence of learning styles on the feedback preferences and elaborated feedback effectiveness

Design of feedback is a critical issue of online assessment development within Web-based Learning Systems (WBLSs). This paper examines the potential possibilities of tailoring the feedback that is presented to a student as a result of his/her preferences and responses to questions of an online test with respect to the individual learning styles (LS). The paper briefly reviews the main types of feedback that can be presented during online assessment and discusses the challenges in authoring and tailoring of feedback in WBLSs. We report the results of some recent experiments organized as online assessment of students through multiple-choice quizzes in which students were able to request diffe…

research product

Immediate elaborated feedback personalization in online assessment

Providing a student with feedback that is timely, most suitable and useful for her personality and the performed task is a challenging problem of online assessment within Web-based Learning Systems (WBLSs). In our recent work we suggested a general approach of feedback adaptation in WBLS and through a series of experiments we demonstrated the possibilities of tailoring the feedback that is presented to a student as a result of her response to questions of an online test, taking into account the individual learning styles (LS), certitude in a response and correctness of this response. In this paper we present the result of the most recent experimental field study where we tested two feedback…

research product

Adaptation of elaborated feedback in e-learning

Design of feedback is a critical issue of online assessment development within Web-based Learning Systems (WBLSs). In our work we demonstrate the possibilities of tailoring the feedback to the students’ learning style (LS), certitude in response and its correctness. We observe in the experimental studies that these factors have a significant influence on the feedback preferences of students and the effectiveness of elaborated feedback (EF), i.e. students’ performance improvement during the test. These observations helped us to develop a simple EF recommendation approach. Our experimental study shows that (1) many students are eager to follow the recommendations on necessity to read certain …

research product

On the use of information systems research methods in data mining

Information systems are powerful instruments for organizational problem solving through formal information processing (Lyytinen, 1987). Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it (Fayyad, 1996). Data mining bridges many technical areas, including databases, statistics, machine learning, and human-computer interaction. The set of data mining processes used to extract and verify patterns in data is the core of the knowledge discovery process. Numerous data mining techniques have recently been developed to extract knowledge from large databases. The area of data mining is historically more related to AI (Artificial…

research product