Author: S. Puuronen

0000000001277023

AUTHOR

S. Puuronen

showing 10 related works from this author

Does relevance matter to data mining research?

2008

Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it. We review several existing frameworks for DM research that originate from different paradigms. These DM frameworks mainly address various DM algorithms for the different steps of the DM process. Recent research has shown that many real-world problems require integration of several DM algorithms from different paradigms in order to produce a better solution elevating the importance of practice-oriented aspects also in DM research. In this chapter we strongly emphasize that DM research should also take into account the relevance of research, not only the rigor of it. Und…

Knowledge extractionAssociation rule learningComputer scienceProcess (engineering)Granular computingInformation systemSoftware miningRelevance (information retrieval)Data miningcomputer.software_genreData sciencecomputerSketch

researchProduct

Knowledge Discovery from Microbiology Data: Many-Sided Analysis of Antibiotic Resistance in Nosocomial Infections

2005

Nosocomial infections and antimicrobial resistance (AR) are highly important problems that impact the morbidity and mortality of hospitalized patients as well as their cost of care. The goal of this paper is to demonstrate our analysis of AR by applying a number of various data mining (DM) techniques to real hospital data. The data for the analysis includes instances of sensitivity of nosocomial infections to antibiotics collected in a hospital over three years 2002-2004. The results of our study show that DM makes it easy for experts to inspect patterns that might otherwise be missed by usual (manual) infection control. However, the clinical relevance and utility of these findings await th…

medicine.medical_specialtyOperations researchmedicine.drug_classHospitalized patientsComputer scienceKnowledge engineeringAntibioticsAntibiotic resistanceKnowledge extractionmedicineInfection controlRelevance (information retrieval)Intensive care medicineCost of careProspective cohort study

researchProduct

Feature extraction for classification in knowledge discovery systems

2003

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". We consider three different eigenvector-based feature extraction approaches for classification. The summary of obtained results concerning the accuracy of classification schemes is presented and the issue of search for the most appropriate feature extraction method for a given data set is considered. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the d…

Decision support systembusiness.industryComputer scienceDimensionality reductionFeature extractionMachine learningcomputer.software_genreKnowledge acquisitionk-nearest neighbors algorithmKnowledge extractionFeature (computer vision)Artificial intelligenceData miningbusinesscomputerCurse of dimensionalityKnowledge-Based Intelligent Information and Engineering Systems (Proceedings 7th International Conference, KES 2003, Oxford, UK, September 3-5, 2003), Part I

researchProduct

Local dimensionality reduction within natural clusters for medical data analysis

2005

Inductive learning systems have been successfully applied in a number of medical domains. Nevertheless, the effective use of these systems requires data preprocessing before applying a learning algorithm. Especially it is important for multidimensional heterogeneous data, presented by a large number of features of different types. Dimensionality reduction is one commonly applied approach. The goal of this paper is to study the impact of natural clustering on dimensionality reduction for classification. We compare several data mining strategies that apply dimensionality reduction by means of feature extraction or feature selection for subsequent classification. We show experimentally on micr…

business.industryComputer scienceFeature vectorDimensionality reductionFeature extractionPattern recognitionFeature selectioncomputer.software_genreArtificial intelligenceData pre-processingData miningMultidimensional systemsbusinessCluster analysiscomputerCurse of dimensionality

researchProduct

Effectiveness of local feature selection in ensemble learning for prediction of antimicrobial resistance

2008

In the real world concepts are often not stable but change over time. A typical example of this in the biomedical context is antibiotic resistance, where pathogen sensitivity may change over time as pathogen strains develop resistance to antibiotics that were previously effective. This problem, known as concept drift (CD), complicates the task of learning a robust model. Different ensemble learning (EL) approaches (that instead of learning a single classifier try to learn and maintain a set of classifiers over time) have been shown to perform reasonably well in the presence of concept drift. In this paper we study how much local feature selection (FS) can improve ensemble performance for da…

Change over timeConcept driftbusiness.industryComputer sciencemedia_common.quotation_subjectSystem testingFeature selectionMachine learningcomputer.software_genreEnsemble learningStatistical classificationVotingArtificial intelligenceData miningbusinesscomputerClassifier (UML)media_common

researchProduct

Keynote Paper: Data Mining Researcher, Who is Your Customer? Some Issues Inspired by the Information Systems Field

2006

Data mining as an applied research field is still causing great expectations among organizations which want to raise the utility they are getting from their huge databases and data warehouses. There exist too few success stories about organizations having managed to satisfy even some of those expectations. This situation is very similar to the one inside the information systems (IS) field, especially earlier but even currently. The recent lively debate about the identity of the IS discipline included also the analysis concerning the customers of IS research. Inspired by IS researchers' insights related to the topic, we ask the question "who is our customer?" as data mining researchers. With…

Work (electrical)Computer scienceInformation systemIdentity (social science)Applied researchData miningcomputer.software_genreData sciencecomputerData warehouseField (computer science)17th International Conference on Database and Expert Systems Applications (DEXA'06)

researchProduct

Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis

2006

Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large number of features of different types. Dimensionality reduction (DR) is one commonly applied approach. The goal of this paper is to study the impact of natural clustering--clustering according to expert domain knowledge--on DR for supervised learning (SL) in the area of antibiotic resistance. We compare several data-mining strategies that apply DR by means of feature extraction or feature selection w…

Databases FactualComputer scienceFeature extractionInformation Storage and RetrievalFeature selectionMachine learningcomputer.software_genreModels BiologicalPattern Recognition AutomatedImmune systemArtificial IntelligenceDrug Resistance BacterialCluster AnalysisHumansComputer SimulationElectrical and Electronic EngineeringRepresentation (mathematics)Cluster analysisCross Infectionbusiness.industryDimensionality reductionSupervised learningGeneral MedicineAnti-Bacterial AgentsComputer Science ApplicationsData pre-processingData miningArtificial intelligenceMultidimensional systemsbusinesscomputerAlgorithmsBiotechnology

researchProduct

Dynamic integration of classifiers in the space of principal components

2003

Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the integration procedure in the ensemble should properly utilize the ensemble diversity. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique of dynamic integration, in which local accuracy estimates are calculated for each base classifier of an ensemble, in the neighborhood of a new instance to be pr…

Random subspace methodInformation extractionComputingMethodologies_PATTERNRECOGNITIONComputer sciencePrincipal component analysisFeature extractionData miningcomputer.software_genrecomputerClassifier (UML)Numerical integrationInformation integrationCurse of dimensionality

researchProduct

Tailoring feedback in online assessment: Influence of learning styles on the feedback preferences and elaborated feedback effectiveness

2008

Design of feedback is a critical issue of online assessment development within Web-based Learning Systems (WBLSs). This paper examines the potential possibilities of tailoring the feedback that is presented to a student as a result of his/her preferences and responses to questions of an online test with respect to the individual learning styles (LS). The paper briefly reviews the main types of feedback that can be presented during online assessment and discusses the challenges in authoring and tailoring of feedback in WBLSs. We report the results of some recent experiments organized as online assessment of students through multiple-choice quizzes in which students were able to request diffe…

CorrectnessMultimediaPeer feedbackbusiness.industryComputer sciencecomputer.software_genreTest (assessment)Online assessmentLearning stylesHuman–computer interactionEngineering educationThe Internetbusinesscomputer

researchProduct

On the use of information systems research methods in data mining

2006

Information systems are powerful instruments for organizational problem solving through formal information processing (Lyytinen, 1987). Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it (Fayyad, 1996). Data mining bridges many technical areas, including databases, statistics, machine learning, and human-computer interaction. The set of data mining processes used to extract and verify patterns in data is the core of the knowledge discovery process. Numerous data mining techniques have recently been developed to extract knowledge from large databases. The area of data mining is historically more related to AI (Artificial…

Knowledge extractionComputer scienceProcess (engineering)Pattern recognition (psychology)Information systemProbabilistic logicTechnical reportInformation processingData miningcomputer.software_genrecomputerField (computer science)

researchProduct