0000000001128261

AUTHOR

Andreas Karwath

showing 12 related works from this author

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

2016

The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the est…

Data streamMahalanobis distanceComputer scienceData stream miningbusiness.industry02 engineering and technologyDensity estimationcomputer.software_genreSet (abstract data type)Software020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingData miningbusinesscomputerCurse of dimensionalityVector space
researchProduct

A structural cluster kernel for learning on graphs

2012

In recent years, graph kernels have received considerable interest within the machine learning and data mining community. Here, we introduce a novel approach enabling kernel methods to utilize additional information hidden in the structural neighborhood of the graphs under consideration. Our novel structural cluster kernel (SCK) incorporates similarities induced by a structural clustering algorithm to improve state-of-the-art graph kernels. The approach taken is based on the idea that graph similarity can not only be described by the similarity between the graphs themselves, but also by the similarity they possess with respect to their structural neighborhood. We applied our novel kernel in…

Graph kernelbusiness.industryPattern recognitionComputingMethodologies_PATTERNRECOGNITIONKernel methodString kernelPolynomial kernelKernel embedding of distributionsRadial basis function kernelArtificial intelligenceTree kernelCluster analysisbusinessMathematicsProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
researchProduct

Online Estimation of Discrete Densities

2013

We address the problem of estimating a discrete joint density online, that is, the algorithm is only provided the current example and its current estimate. The proposed online estimator of discrete densities, EDDO (Estimation of Discrete Densities Online), uses classifier chains to model dependencies among features. Each classifier in the chain estimates the probability of one particular feature. Because a single chain may not provide a reliable estimate, we also consider ensembles of classifier chains and ensembles of weighted classifier chains. For all density estimators, we provide consistency proofs and propose algorithms to perform certain inference tasks. The empirical evaluation of t…

Concept driftStochastic processEstimation theoryBayesian probabilityEstimatorInferenceData miningClassifier chainscomputer.software_genreClassifier (UML)computerMathematics2013 IEEE 13th International Conference on Data Mining
researchProduct

Convolutional Neural Networks for the Identification of Regions of Interest in PET Scans: A Study of Representation Learning for Diagnosing Alzheimer…

2017

When diagnosing patients suffering from dementia based on imaging data like PET scans, the identification of suitable predictive regions of interest (ROIs) is of great importance. We present a case study of 3-D Convolutional Neural Networks (CNNs) for the detection of ROIs in this context, just using voxel data, without any knowledge given a priori. Our results on data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) suggest that the predictive performance of the method is on par with that of state-of-the-art methods, with the additional benefit of potential insights into affected brain regions.

Computer sciencebusiness.industryDeep learning05 social sciencesContext (language use)medicine.diseasecomputer.software_genreMachine learningConvolutional neural network03 medical and health sciencesIdentification (information)0302 clinical medicineNeuroimagingVoxelmental disordersmedicineDementia0501 psychology and cognitive sciences050102 behavioral science & comparative psychologyArtificial intelligencebusinesscomputerFeature learning030217 neurology & neurosurgery
researchProduct

A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR.

2013

(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate t…

Computer sciencemedia_common.quotation_subjectOrganic ChemistryScale (descriptive set theory)Variance (accounting)computer.software_genreCross-validationComputer Science ApplicationsModel validationWorkflowStructural BiologyCheminformaticsTest setDrug DiscoveryMolecular MedicineQuality (business)Data miningcomputermedia_commonMolecular informatics
researchProduct

Extracting information from support vector machines for pattern-based classification

2014

Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pa…

business.industryComputer scienceFeature vectorSolution setPattern recognitioncomputer.software_genreGraphDomain (software engineering)Support vector machineRelevance (information retrieval)Fraction (mathematics)Noise (video)Artificial intelligenceData miningbusinesscomputerProceedings of the 29th Annual ACM Symposium on Applied Computing
researchProduct

Modeling recurrent distributions in streams using possible worlds

2015

Discovering changes in the data distribution of streams and discovering recurrent data distributions are challenging problems in data mining and machine learning. Both have received a lot of attention in the context of classification. With the ever increasing growth of data, however, there is a high demand of compact and universal representations of data streams that enable the user to analyze current as well as historic data without having access to the raw data. To make a first step towards this direction, we propose a condensed representation that captures the various — possibly recurrent — data distributions of the stream by extending the notion of possible worlds. The representation en…

Possible worldBasis (linear algebra)Computer scienceData stream miningRepresentation (systemics)Context (language use)Data pre-processingData miningRaw datacomputer.software_genrecomputerData modeling2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
researchProduct

A probabilistic condensed representation of data for stream mining

2014

Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider t…

Task (computing)Association rule learningData stream miningSimple (abstract algebra)Computer scienceProbabilistic logicProbabilistic analysis of algorithmsAlgorithm designData miningRepresentation (mathematics)computer.software_genrecomputer2014 International Conference on Data Science and Advanced Analytics (DSAA)
researchProduct

Pairwise Learning to Rank by Neural Networks Revisited: Reconstruction, Theoretical Analysis and Practical Performance

2020

We present a pairwise learning to rank approach based on a neural net, called DirectRanker, that generalizes the RankNet architecture. We show mathematically that our model is reflexive, antisymmetric, and transitive allowing for simplified training and improved performance. Experimental results on the LETOR MSLR-WEB10K, MQ2007 and MQ2008 datasets show that our model outperforms numerous state-of-the-art methods, while being inherently simpler in structure and using a pairwise approach only.

Transitive relationPairwise learningTheoretical computer scienceArtificial neural networkAntisymmetric relationComputer scienceRank (computer programming)Structure (category theory)Pairwise comparisonLearning to rank
researchProduct

CheS-Mapper - Chemical Space Mapping and Visualization in 3D

2012

Abstract Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which f…

Process (engineering)Computer sciencemedia_common.quotation_subjectLibrary and Information Sciencescomputer.software_genre01 natural scienceslcsh:Chemistry03 medical and health sciencesSimilarity (psychology)Physical and Theoretical ChemistryFunction (engineering)030304 developmental biologymedia_commonStructure (mathematical logic)0303 health scienceslcsh:T58.5-58.64lcsh:Information technology004 InformatikComputer Graphics and Computer-Aided DesignChemical spaceField (geography)0104 chemical sciencesVisualizationComputer Science Applications010404 medicinal & biomolecular chemistrylcsh:QD1-999CheminformaticsData miningcomputer004 Data processingSoftwareJournal of Cheminformatics
researchProduct

CheS-Mapper 2.0 for visual validation of (Q)SAR models

2014

Abstract Background Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking. Results We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of sm…

Visualization methodsComputer scienceFeature vectorLibrary and Information Sciencescomputer.software_genre01 natural sciences(Q)SARModel validation03 medical and health sciencesSoftwareValidationOverall performancePhysical and Theoretical ChemistryVisualization030304 developmental biology0303 health sciencesbusiness.industryStatistical validationComputer Graphics and Computer-Aided Design0104 chemical sciencesComputer Science ApplicationsVisualization010404 medicinal & biomolecular chemistry3d space3D spaceData miningbusinesscomputerSoftwareJournal of Cheminformatics
researchProduct

Machine learning risk prediction of mortality for patients undergoing surgery with perioperative SARS-CoV-2: the COVIDSurg mortality score

2021

The British journal of surgery 108(11), 1274-1292 (2021). doi:10.1093/bjs/znab183

Cuidado perioperatorioAcademicSubjects/MED00910Settore MED/18 - CHIRURGIA GENERALEMedizinpulmonary complicationspreoperative screeningDatasets as TopicSurgical Procedures Operative/mortality030230 surgeryperioperative care ; surgical procedures ; operative mortality ; machine learning ; sars-cov-2Medical and Health SciencesProcediments quirúrgicsCohort StudiesMachine LearningTumours of the digestive tract Radboud Institute for Health Sciences [Radboudumc 14]0302 clinical medicineModelsProcedimientos quirúrgicosMedicine and Health SciencesCOVIDSurg Collaborative Co-authorsMedicine030212 general & internal medicineskin and connective tissue diseasesRapid Research Communication11 Medical and Health SciencesOperative/mortalitySARS-CoV-19COVID-19/mortalityStatisticalCOVID-19/mortality; Cohort Studies; Datasets as Topic; Humans; Machine Learning; Models Statistical; Risk Assessment; SARS-CoV-2; Surgical Procedures Operative/mortalityCOVID-19; Cohort Studies; Datasets as Topic; Humans; Machine Learning; SARS-CoV-2; Surgical Procedures Operative; Models Statistical; Risk AssessmentAprendizaje automáticoOperativeSurgical Procedures OperativeoutcomeOperativo[SDV.IB]Life Sciences [q-bio]/BioengineeringPatient SafetyAcademicSubjects/MED000106.4 SurgeryLife Sciences & BiomedicineHuman61medicine.medical_specialty616.9Coronavirus disease 2019 (COVID-19)Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-.Risk AssessmentNOCOVIDSurg CollaborativeVaccine Related03 medical and health sciencesClinical ResearchBiodefenseCures perioperatòriesAprenentatge automàticMortalitatHumansOperatiusLS7_4Surgical ProceduresScience & TechnologyModels Statisticalbusiness.industrySARS-CoV-2SARS-CoV-2 infectionKirurgiPreventionnot indicatedcovid 19fungiEvaluation of treatments and therapeutic interventionsCOVID-19Perioperativecovid 19; pulmonary complications; postoperative mortality risk; SARS-CoV-2 infection; preoperative screening; vaccinationvaccinationmortalityGood Health and Well BeingMortalidadEmergency medicineSurgeryHuman medicineCohort Studiebusinesspostoperative mortality riskPerioperative care
researchProduct