0000000000563679

AUTHOR

Madeleine Seeland

showing 7 related works from this author

Structural clustering of millions of molecular graphs

2014

We propose an algorithm for clustering very large molecular graph databases according to scaffolds (i.e., large structural overlaps) that are common between cluster members. Our approach first partitions the original dataset into several smaller datasets using a greedy clustering approach named APreClus based on dynamic seed clustering. APreClus is an online and instance incremental clustering algorithm delaying the final cluster assignment of an instance until one of the so-called pending clusters the instance belongs to has reached significant size and is converted to a fixed cluster. Once a cluster is fixed, APreClus recalculates the cluster centers, which are used as representatives for…

Clustering high-dimensional dataFuzzy clusteringTheoretical computer sciencek-medoidsComputer scienceSingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreComplete-linkage clusteringGraphHierarchical clusteringComputingMethodologies_PATTERNRECOGNITIONData stream clusteringCURE data clustering algorithmCanopy clustering algorithmFLAME clusteringAffinity propagationData miningCluster analysiscomputerk-medians clusteringClustering coefficientProceedings of the 29th Annual ACM Symposium on Applied Computing
researchProduct

A structural cluster kernel for learning on graphs

2012

In recent years, graph kernels have received considerable interest within the machine learning and data mining community. Here, we introduce a novel approach enabling kernel methods to utilize additional information hidden in the structural neighborhood of the graphs under consideration. Our novel structural cluster kernel (SCK) incorporates similarities induced by a structural clustering algorithm to improve state-of-the-art graph kernels. The approach taken is based on the idea that graph similarity can not only be described by the similarity between the graphs themselves, but also by the similarity they possess with respect to their structural neighborhood. We applied our novel kernel in…

Graph kernelbusiness.industryPattern recognitionComputingMethodologies_PATTERNRECOGNITIONKernel methodString kernelPolynomial kernelKernel embedding of distributionsRadial basis function kernelArtificial intelligenceTree kernelCluster analysisbusinessMathematicsProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
researchProduct

Optimization of curation of the dataset with data on repeated dose toxicity

2015

Introduction: For some areas of risk assessment, the use of alter-native methods is supported by current directives and guidance(e.g. REACH, Cosmetics, BPD, PPP). According to OECD principles alternative methods need to be scientifically valid. Methods: Within a project on grouping and development of predictive models sup-ported by a grant of Federal Ministry of Education and Research, we curated a dataset based on RepDose and ELINCS database. The final dataset consists of rat repeated dose toxicity studies for 1022 com-pounds representing 28 endpoints as organ-effect-combinations. Toxicological and modelling experts did jointly the curation and selection of endpoints as an iterative proces…

business.industryToxicityMedicineGeneral MedicineToxicologyBioinformaticsbusinessToxicology Letters
researchProduct

Innovative Strategies to Develop Chemical Categories Using a Combination of Structural and Toxicological Properties.

2016

Interest is increasing in the development of non-animal methods for toxicological evaluations. These methods are however, particularly challenging for complex toxicological endpoints such as repeated dose toxicity. European Legislation, e.g., the European Union's Cosmetic Directive and REACH, demands the use of alternative methods. Frameworks, such as the Read-across Assessment Framework or the Adverse Outcome Pathway Knowledge Base, support the development of these methods. The aim of the project presented in this publication was to develop substance categories for a read-across with complex endpoints of toxicity based on existing databases. The basic conceptual approach was to combine str…

0301 basic medicineQuantitative structure–activity relationshipread acrossPredictive Clustering Tree (PCT) methodComputer science610010501 environmental sciencescomputer.software_genre600 Technik Medizin angewandte Wissenschaften::610 Medizin und Gesundheit01 natural sciences03 medical and health sciencesPharmacology (medical)Cluster analysis0105 earth and related environmental sciencesOriginal ResearchAlternative methodsPharmacologytoxicological and structural similaritybusiness.industryQSARlcsh:RM1-950non-animal methods; QSAR; readacross; Predictive Clustering Tree (PCT) method; toxicological and structural similarityIdentification (information)Tree (data structure)030104 developmental biologyConceptual approachlcsh:Therapeutics. PharmacologyKnowledge basenon-animal methodsData miningWeb servicebusinesscomputerFrontiers in pharmacology
researchProduct

Extracting information from support vector machines for pattern-based classification

2014

Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pa…

business.industryComputer scienceFeature vectorSolution setPattern recognitioncomputer.software_genreGraphDomain (software engineering)Support vector machineRelevance (information retrieval)Fraction (mathematics)Noise (video)Artificial intelligenceData miningbusinesscomputerProceedings of the 29th Annual ACM Symposium on Applied Computing
researchProduct

Model selection based product kernel learning for regression on graphs

2013

The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels on the basis of a labeled training set of molecules to define a new kernel. Whereas most methods for multiple kernel learning focus on learning convex linear combinations of kernels, we propose to combine kernels in products, which theoretically enables higher expressiveness. In experiments on ten publicly available chemical QSAR datasets we show that product kernel learning is on no dataset significantly worse than any of the competing kernel methods and on average the…

Graph kernelTraining setMultiple kernel learningComputer sciencebusiness.industryPattern recognitionSemi-supervised learningMachine learningcomputer.software_genreKernel (linear algebra)Kernel methodKernel embedding of distributionsPolynomial kernelKernel (statistics)Radial basis function kernelArtificial intelligenceTree kernelbusinesscomputerProceedings of the 28th Annual ACM Symposium on Applied Computing
researchProduct

Maximum Common Subgraph based locally weighted regression

2012

This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in chem-informatics and for quantitative structure-activity relationships (QSARs). The method combines Locally Weighted Learning (LWL) with Maximum Common Subgraph (MCS) based graph distances. More specifically, we investigate a variant of locally weighted regression on graphs (structures) that uses the maximum common subgraph for determining and weighting the neighborhood of a graph and feature vectors for the actual regression model. We show that this combination, LWL-MCS, outperforms other methods that use the local neighborhood of graphs for regression. The performance of this…

Computer sciencebusiness.industryFeature vectorLocal regressionPattern recognitionRegression analysisGraphWeightingCombinatoricsLazy learningSimple (abstract algebra)Artificial intelligenceCluster analysisbusinessMathematicsofComputing_DISCRETEMATHEMATICSProceedings of the 27th Annual ACM Symposium on Applied Computing
researchProduct