Author: M. Pechenizkiy

0000000001277022

AUTHOR

M. Pechenizkiy

showing 13 related works from this author

Dynamic integration with random forests

2006

Random Forests (RF) are a successful ensemble prediction technique that uses majority voting or averaging as a combination function. However, it is clear that each tree in a random forest may have a different contribution in processing a certain instance. In this paper, we demonstrate that the prediction performance of RF may still be improved in some domains by replacing the combination function with dynamic integration, which is based on local performance estimates. Our experiments also demonstrate that the RF Intrinsic Similarity is better than the commonly used Heterogeneous Euclidean/Overlap Metric in finding a neighbourhood for local estimates in the context of dynamic integration of …

researchProduct

Tailoring of Feedback in Web-Based Learning: The Role of Response Certitude in the Assessment

2008

This paper analyzes the challenges of tailoring feedback to the student’s response certitude during the assessment in Web-based Learning systems (WBLSs). We present the summary of the results of a series of experiments related to the online assessment of students through multiple-choice quizzes, where students had to select the confidence level and were able to request different kinds of feedback for each of the answered questions.

MultimediaComputer scienceWeb based learningComputingMilieux_COMPUTERSANDEDUCATIONcomputer.software_genrecomputerOnline assessment

researchProduct

Diversity in random subspacing ensembles

2004

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…

Computer sciencemedia_common.quotation_subjectAmbiguityEnsemble diversitycomputer.software_genreEnsemble learningData warehouseCorrelationInformation extractionKnowledge extractionStatisticsEntropy (information theory)Data miningcomputermedia_common

researchProduct

The impact of feature extraction on the performance of a classifier : kNN, Naïve Bayes and C4.5

2005

"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity and the classification error in high dimensions. In this paper, different feature extraction techniques as means of (1) dimensionality reduction, and (2) constructive induction are analyzed with respect to the performance of a classifier. Three commonly used classifiers are taken for the analysis: kNN, Naïve Bayes and C4.5 decision tree. One of the main goals of this paper is to show the importance of the use of class information in feature extraction for classification and (in)appropriateness of random projection or conventional PCA to feature extraction for …

Covariance matrixComputer sciencebusiness.industryRandom projectionDimensionality reductionFeature extractionLinear classifierPattern recognitionMachine learningcomputer.software_genreNaive Bayes classifierComputingMethodologies_PATTERNRECOGNITIONPrincipal component analysisArtificial intelligencebusinesscomputerCurse of dimensionalityAdvances in artificial intelligence : 18th conference of the canadian society for computational Studies of Intelligence, Canadian AI 2005, Victoria, Canada, May 9-11, 2005 : proceedings

researchProduct

Online mass flow prediction in CFB boilers

2009

Fuel feeding and inhomogeneity of fuel typically cause process fluctuations in the circulating fluidized bed (CFB) process. If control systems fail to compensate for the fluctuations, the whole plant will suffer from fluctuations that are reinforced by the closed-loop controls. This phenomenon causes a reduction of efficiency and lifetime of process components. Therefore, domain experts are interested in developing tools and techniques for getting better understanding of underlying processes and their mutual dependencies in CFB boilers. In this paper we consider an application of data mining technology to the analysis of time series data from a pilot CFB reactor. Namely, we present a rather…

Computer sciencebusiness.industryControl systemMass flowBoiler (power generation)Fluidized bed combustionTime seriesProcess engineeringbusinessSimulationActive noise control

researchProduct

Does relevance matter to data mining research?

2008

Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it. We review several existing frameworks for DM research that originate from different paradigms. These DM frameworks mainly address various DM algorithms for the different steps of the DM process. Recent research has shown that many real-world problems require integration of several DM algorithms from different paradigms in order to produce a better solution elevating the importance of practice-oriented aspects also in DM research. In this chapter we strongly emphasize that DM research should also take into account the relevance of research, not only the rigor of it. Und…

Knowledge extractionAssociation rule learningComputer scienceProcess (engineering)Granular computingInformation systemSoftware miningRelevance (information retrieval)Data miningcomputer.software_genreData sciencecomputerSketch

researchProduct

Knowledge Discovery from Microbiology Data: Many-Sided Analysis of Antibiotic Resistance in Nosocomial Infections

2005

Nosocomial infections and antimicrobial resistance (AR) are highly important problems that impact the morbidity and mortality of hospitalized patients as well as their cost of care. The goal of this paper is to demonstrate our analysis of AR by applying a number of various data mining (DM) techniques to real hospital data. The data for the analysis includes instances of sensitivity of nosocomial infections to antibiotics collected in a hospital over three years 2002-2004. The results of our study show that DM makes it easy for experts to inspect patterns that might otherwise be missed by usual (manual) infection control. However, the clinical relevance and utility of these findings await th…

medicine.medical_specialtyOperations researchmedicine.drug_classHospitalized patientsComputer scienceKnowledge engineeringAntibioticsAntibiotic resistanceKnowledge extractionmedicineInfection controlRelevance (information retrieval)Intensive care medicineCost of careProspective cohort study

researchProduct

Feature extraction for classification in knowledge discovery systems

2003

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". We consider three different eigenvector-based feature extraction approaches for classification. The summary of obtained results concerning the accuracy of classification schemes is presented and the issue of search for the most appropriate feature extraction method for a given data set is considered. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the d…

Decision support systembusiness.industryComputer scienceDimensionality reductionFeature extractionMachine learningcomputer.software_genreKnowledge acquisitionk-nearest neighbors algorithmKnowledge extractionFeature (computer vision)Artificial intelligenceData miningbusinesscomputerCurse of dimensionalityKnowledge-Based Intelligent Information and Engineering Systems (Proceedings 7th International Conference, KES 2003, Oxford, UK, September 3-5, 2003), Part I

researchProduct

Dynamic integration of classifiers in the space of principal components

2003

Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the integration procedure in the ensemble should properly utilize the ensemble diversity. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique of dynamic integration, in which local accuracy estimates are calculated for each base classifier of an ensemble, in the neighborhood of a new instance to be pr…

Random subspace methodInformation extractionComputingMethodologies_PATTERNRECOGNITIONComputer sciencePrincipal component analysisFeature extractionData miningcomputer.software_genrecomputerClassifier (UML)Numerical integrationInformation integrationCurse of dimensionality

researchProduct

Tailoring feedback in online assessment: Influence of learning styles on the feedback preferences and elaborated feedback effectiveness

2008

Design of feedback is a critical issue of online assessment development within Web-based Learning Systems (WBLSs). This paper examines the potential possibilities of tailoring the feedback that is presented to a student as a result of his/her preferences and responses to questions of an online test with respect to the individual learning styles (LS). The paper briefly reviews the main types of feedback that can be presented during online assessment and discusses the challenges in authoring and tailoring of feedback in WBLSs. We report the results of some recent experiments organized as online assessment of students through multiple-choice quizzes in which students were able to request diffe…

CorrectnessMultimediaPeer feedbackbusiness.industryComputer sciencecomputer.software_genreTest (assessment)Online assessmentLearning stylesHuman–computer interactionEngineering educationThe Internetbusinesscomputer

researchProduct

Immediate elaborated feedback personalization in online assessment

2008

Providing a student with feedback that is timely, most suitable and useful for her personality and the performed task is a challenging problem of online assessment within Web-based Learning Systems (WBLSs). In our recent work we suggested a general approach of feedback adaptation in WBLS and through a series of experiments we demonstrated the possibilities of tailoring the feedback that is presented to a student as a result of her response to questions of an online test, taking into account the individual learning styles (LS), certitude in a response and correctness of this response. In this paper we present the result of the most recent experimental field study where we tested two feedback…

CorrectnessMultimediaComputer sciencemedia_common.quotation_subjectcomputer.software_genreField (computer science)Task (project management)PersonalizationLearning stylesFormative assessmentHuman–computer interactionPersonalityAdaptation (computer science)computermedia_common

researchProduct

Adaptation of elaborated feedback in e-learning

2008

Design of feedback is a critical issue of online assessment development within Web-based Learning Systems (WBLSs). In our work we demonstrate the possibilities of tailoring the feedback to the students’ learning style (LS), certitude in response and its correctness. We observe in the experimental studies that these factors have a significant influence on the feedback preferences of students and the effectiveness of elaborated feedback (EF), i.e. students’ performance improvement during the test. These observations helped us to develop a simple EF recommendation approach. Our experimental study shows that (1) many students are eager to follow the recommendations on necessity to read certain …

Knowledge managementCorrectnessPeer feedbackComputer sciencebusiness.industryE-learning (theory)Mathematics educationPerformance improvementAdaptation (computer science)businessTest (assessment)Online assessment

researchProduct

On the use of information systems research methods in data mining

2006

Information systems are powerful instruments for organizational problem solving through formal information processing (Lyytinen, 1987). Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it (Fayyad, 1996). Data mining bridges many technical areas, including databases, statistics, machine learning, and human-computer interaction. The set of data mining processes used to extract and verify patterns in data is the core of the knowledge discovery process. Numerous data mining techniques have recently been developed to extract knowledge from large databases. The area of data mining is historically more related to AI (Artificial…

Knowledge extractionComputer scienceProcess (engineering)Pattern recognition (psychology)Information systemProbabilistic logicTechnical reportInformation processingData miningcomputer.software_genrecomputerField (computer science)

researchProduct