Search results for "machine"

showing 10 items of 2592 documents

Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars

2021

We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being conta…

Class (set theory)Property (philosophy)Theoretical computer scienceGrammarRule-based machine translationComputer scienceSmall numbermedia_common.quotation_subjectGraph (abstract data type)Construct (python library)Type (model theory)media_common
researchProduct

A new paradigm for pattern classification: Nearest Border Techniques

2013

Published version of a chapter in the book: AI 2013: Advances in Artificial Intelligence. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-319-03680-9_44 There are many paradigms for pattern classification. As opposed to these, this paper introduces a paradigm that has not been reported in the literature earlier, which we shall refer to as the Nearest Border (NB) paradigm. The philosophy for developing such a NB strategy is as follows: Given the training data set for each class, we shall first attempt to create borders for each individual class. After that, we advocate that testing is accomplished by assigning the test sample to the class whose border it lies closest to…

Class (set theory)Training setPattern ClassificationComputer sciencebusiness.industrySVMVDP::Mathematics and natural science: 400::Information and communication science: 420::Algorithms and computability theory: 422Centroid02 engineering and technology01 natural sciencesVDP::Mathematics and natural science: 400::Mathematics: 410::Analysis: 411Support vector machine010104 statistics & probabilityExperimental testingOutlier0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingArtificial intelligence0101 mathematics10. No inequalitySet (psychology)businessTest sampleBorder Identification
researchProduct

Models of Computation, Riemann Hypothesis, and Classical Mathematics

1998

Classical mathematics is a source of ideas used by Computer Science since the very first days. Surprisingly, there is still much to be found. Computer scientists, especially, those in Theoretical Computer Science find inspiring ideas both in old notions and results, and in the 20th century mathematics. The latest decades have brought us evidence that computer people will soon study quantum physics and modern biology just to understand what computers are doing.

Classical mathematicsFinite-state machineComputer sciencebusiness.industryModel of computationEpistemologyPhilosophy of computer sciencePhilosophy of languageTuring machinesymbols.namesakeRiemann hypothesisFormal languagesymbolsArtificial intelligencebusiness
researchProduct

Variability of Classification Results in Data with High Dimensionality and Small Sample Size

2021

The study focuses on the analysis of biological data containing information on the number of genome sequences of intestinal microbiome bacteria before and after antibiotic use. The data have high dimensionality (bacterial taxa) and a small number of records, which is typical of bioinformatics data. Classification models induced on data sets like this usually are not stable and the accuracy metrics have high variance. The aim of the study is to create a preprocessing workflow and a classification model that can perform the most accurate classification of the microbiome into groups before and after the use of antibiotics and lessen the variability of accuracy measures of the classifier. To ev…

Classification algorithms; feature selection; high dimensionality; machine learningInformation Technology and Management Science
researchProduct

Comparative analysis of architectures for monitoring cloud computing infrastructures

2015

The lack of control over the cloud resources is one of the main disadvantages associated to cloud computing. The design of efficient architectures for monitoring such resources can help to overcome this problem. This contribution describes a complete set of architectures for monitoring cloud computing infrastructures, and provides a taxonomy of them. The architectures are described in detail, compared among them, and analysed in terms of performance, scalability, usage of resources, and security capabilities. The architectures have been implemented in real world settings and empirically validated against a real cloud computing infrastructure based on OpenStack. More than 1000 virtual machin…

Cloud computing securityComputer Networks and Communicationsbusiness.industryComputer scienceDistributed computingCloud computingcomputer.software_genreSet (abstract data type)Utility computingHardware and ArchitectureVirtual machineScalabilitybusinesscomputerSoftwareFuture Generation Computer Systems
researchProduct

A local complexity based combination method for decision forests trained with high-dimensional data

2012

Accurate machine learning with high-dimensional data is affected by phenomena known as the “curse” of dimensionality. One of the main strategies explored in the last decade to deal with this problem is the use of multi-classifier systems. Several of such approaches are inspired by the Random Subspace Method for the construction of decision forests. Furthermore, other studies rely on estimations of the individual classifiers' competence, to enhance the combination in the multi-classifier and improve the accuracy. We propose a competence estimate which is based on local complexity measurements, to perform a weighted average combination of the decision forest. Experimental results show how thi…

Clustering high-dimensional dataComputational complexity theorybusiness.industryComputer scienceDecision treeMachine learningcomputer.software_genreRandom forestRandom subspace methodArtificial intelligenceData miningbusinessCompetence (human resources)computerClassifier (UML)Curse of dimensionality2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)
researchProduct

Data Analysis and Bioinformatics

2007

Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of different types. And this is a typical situation in biology. Some cases studies are also described.

Clustering high-dimensional dataFuzzy clusteringComputer sciencebusiness.industryCorrelation clusteringConceptual clusteringMachine learningcomputer.software_genreComputingMethodologies_PATTERNRECOGNITIONCURE data clustering algorithmConsensus clusteringCanopy clustering algorithmData miningArtificial intelligenceCluster analysisbusinesscomputer
researchProduct

Distance Functions, Clustering Algorithms and Microarray Data Analysis

2010

Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function works best has been investigated, but no final conclusion has been reached. The aim of this extended abstract is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic sepa…

Clustering high-dimensional dataFuzzy clusteringSettore INF/01 - Informaticabusiness.industryCorrelation clusteringMachine learningcomputer.software_genrePearson product-moment correlation coefficientRanking (information retrieval)Euclidean distancesymbols.namesakeClustering distance measuressymbolsArtificial intelligenceData miningbusinessCluster analysiscomputerMathematicsDe facto standard
researchProduct

Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs

2014

We develop an algorithm that incorporates network information into regression settings. It simultaneously estimates the covariate coefficients and the signs of the network connections (i.e. whether the connections are of an activating or of a repressing type). For the coefficient estimation steps an additional penalty is set on top of the lasso penalty, similarly to Li and Li (2008). We develop a fast implementation for the new method based on coordinate descent. Furthermore, we show how the new methods can be applied to time-to-event data. The new method yields good results in simulation studies concerning sensitivity and specificity of non-zero covariate coefficients, estimation of networ…

Clustering high-dimensional databusiness.industryjel:C41jel:C13Machine learningcomputer.software_genreRegressionhigh-dimensional data gene expression data pathway information penalized regressionConnection (mathematics)Set (abstract data type)Lasso (statistics)CovariateArtificial intelligenceSensitivity (control systems)businessCoordinate descentAlgorithmcomputerMathematics
researchProduct

Bayesian versus data driven model selection for microarray data

2014

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. A…

Clustering Model selection Bayesian information criterion Akaike information criterion Minimum message length BioinformaticsSettore INF/01 - InformaticaComputer sciencebusiness.industryModel selectionBayesian probabilitycomputer.software_genreMachine learningComputer Science ApplicationsData-drivenDetermining the number of clusters in a data setIdentification (information)Bayesian information criterionData miningArtificial intelligenceAkaike information criterionCluster analysisbusinesscomputer
researchProduct