A local complexity based combination method for decision forests trained with high-dimensional data
Accurate machine learning with high-dimensional data is affected by phenomena known as the “curse” of dimensionality. One of the main strategies explored in the last decade to deal with this problem is the use of multi-classifier systems. Several of such approaches are inspired by the Random Subspace Method for the construction of decision forests. Furthermore, other studies rely on estimations of the individual classifiers' competence, to enhance the combination in the multi-classifier and improve the accuracy. We propose a competence estimate which is based on local complexity measurements, to perform a weighted average combination of the decision forest. Experimental results show how thi…
Accelerated Proximal Gradient Descent in Metric Learning for Kernel Regression
The purpose of this paper is to learn a specific distance function for the Nadayara Watson estimator to be applied as a non-linear classifier. The idea of transforming the predictor variables and learning a kernel function based on Mahalanobis pseudo distance througth an low rank structure in the distance function will help us to lead the development of this problem. In context of metric learning for kernel regression, we introduce an Accelerated Proximal Gradient to solve the non-convex optimization problem with better convergence rate than gradient descent. An extensive experiment and the corresponding discussion tries to show that our strategie its a competitive solution in relation to p…
Improving Nearest Neighbor Based Multi-target Prediction Through Metric Learning
The purpose of this work is to learn specific distance functions to be applied for multi-target regression problems using nearest neighbors. The idea of preserving the order relation between input and output vectors considering their corresponding distances is used along a maximal margin criterion to formulate a specific metric learning problem. Extensive experiments and the corresponding discussion try to put forward the advantages of the proposed algorithm that can be considered as a generalization of previously proposed approaches. Preliminary results suggest that this line of work can lead to very competitive algorithms with convenient properties.
An efficient method for clustered multi-metric learning
Abstract Distance metric learning, which aims at finding a distance metric that separates examples of one class from examples of the other classes, is the key to the success of many machine learning tasks. Although there has been an increasing interest in this field, learning a global distance metric is insufficient to obtain satisfactory results when dealing with heterogeneously distributed data. A simple solution to tackle this kind of data is based on kernel embedding methods. However, it quickly becomes computationally intractable as the number of examples increases. In this paper, we propose an efficient method that learns multiple local distance metrics instead of a single global one.…
QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations.
Background In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials…
MOESM1 of QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations
Additional file 1. The mathematical definitions of the norms, means and statistical invariants as generalizations of the linear combination of LOVIs as global (and/or local) MDs aggregation operator, as well as classical algorithms which generalize the first three groups are presented as Figure SI1-Table S12. The UML diagram (Figure SI3), a debug report file content (Figure SI4), a batch process manager dialog window (Figure SI5) are also listed. Some results of the factor analysis by the principal component method are shown as Table SI6-Table SI8, and finally, the names of structures for Cramer’s steroid database and their corresponding values for the binding affinity to the corticosteroid…
Drug Activity Characterization Using One-Class Support Vector Machines with Counterexamples
The problem of detecting chemical activity in drugs from its molecular description constitutes a challenging and hard learning task. The corresponding prediction problem can be tackled either as a binary classification problem (active versus inactive compounds) or as a one class problem. The first option leads usually to better prediction results when measured over small and fixed databases while the second could potentially lead to a much better characterization of the active class which could be more important in more realistic settings. In this paper, a comparison of these two options is presented when support vector models are used as predictors.
Generalized Multitarget Linear Regression with Output Dependence Estimation
Multitarget regression has recently received attention in the context of modern, large-scale problems in which finding good enough solutions in a timely manner is crucial. Different proposed alternatives use a combination of regularizers that lead to different ways of solving the problem. In this work, we introduce a general formulation with several regularizers. This leads to a biconvex minimization problem and we use an alternating procedure with accelerated proximal gradient steps to solve it. We show that our formulation is equivalent but more efficient than some previously proposed approaches. Moreover, we introduce two new variants. The experimental validation carried out, suggests th…
A Comparative Study of Nonlinear Machine Learning for the "In Silico" Depiction of Tyrosinase Inhibitory Activity from Molecular Structure.
In the preset report, for the first time, support vector machine (SVM), artificial neural network (ANN), Baye- sian networks (BNs), k-nearest neighbor (k-NN) are applied and compared on two "in-house" datasets to describe the tyrosinase inhibitory activity from the molecular structure. The data set Data I is used for the identification of tyrosi- nase inhibitors (TIs) including 701 active and 728 inactive compounds. Data II consists of active chemicals for potency estimation of TIs. The 2D TOMOCOMD-CARDD atom-based quadratic indices are used as molecular descriptors. The de- rived models show rather encouraging results with the areas under the Receiver Operating Characteristic (AURC) curve …
A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data
Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.