0000000000666354
AUTHOR
Yovani Marrero-ponce
Applications of Bond-Based 3D-Chiral Quadratic Indices in QSAR Studies Related to Central Chirality Codification
The concept of bond-based quadratic indices is generalized to codify chemical structure information for chiral drugs, making use of a trigonometric 3D-chirality correction factor. In order to evaluate the effectiveness of this novel approach in drug design, we have modeled several well-known data sets. In particularly, Cramer's steroid data set has become a benchmark for the assessment of novel QSAR methods. This data set has been used by several researchers using 3D-QSAR approaches. Therefore, it is selected by us for the shake of comparability. In addition, to evaluate the effectiveness of this novel approach in drug design, we model the angiotensin-converting enzyme inhibitory activity o…
TOMOCOMD-CARDD descriptors-based virtual screening of tyrosinase inhibitors: evaluation of different classification model combinations using bond-based linear indices.
Abstract A new set of bond-level molecular descriptors (bond-based linear indices) are used here in QSAR (quantitative structure–activity relationship) studies of tyrosinase inhibitors, for finding functions that discriminate between the tyrosinase inhibitor compounds and inactive ones. A database of 246 compounds was collected for this study; all organic chemicals were reported as tyrosinase inhibitors; they had great structural diversity. This dataset can be considered as a helpful tool, not only for theoretical chemists but also for other researchers in this area. The set used as inactive has 412 drugs with other clinical uses. Twelve LDA-based QSAR models were obtained, the first six us…
Discrete Derivatives for Atom-Pairs as a Novel Graph-Theoretical Invariant for Generating New Molecular Descriptors: Orthogonality, Interpretation and QSARs/QSPRs on Benchmark Databases.
This report presents a new mathematical method based on the concept of the derivative of a molecular graph (G) with respect to a given event (S) to codify chemical structure information. The derivate over each pair of atoms in the molecule is defined as ∂G/∂S(vi , vj )=(fi -2fij +fj )/fij , where fi (or fj ) and fij are the individual frequency of atom i (or j) and the reciprocal frequency of the atoms i and j, respectively. These frequencies characterize the participation intensity of atom pairs in S. Here, the event space is composed of molecular sub-graphs which participate in the formation of the G skeleton that could be complete (representing all possible connected sub-graphs) or comp…
Quantitative Structure–Activity Relationship of the 4,5α-Dihydrotestosterone Steroid Family
Predictive Quantitative Structure - Activity Relationship (QSAR) models of Anabolic/ Androgenic (A/A) activities for the 4,5a-dihydrotestosterone steroid family were obtained by means of multilinear regression using quantum and physicochemical Molecular Descriptors (MDs) as well as a genetic algorithm for the selection of the best subset of MDs. MDs included in our QSAR models allow the structural interpretation of the biological process, evidencing the main role of the shape of molecules, hydrophobicity, and electronic properties. Attempts were made to include lipophilicity (octanol-water partition coefficient) as well as electronic (lowest unoccupied molecular orbital properties and dipol…
Synthesis, biological evaluation and chemometric analysis of indazole derivatives. 1,2-Disubstituted 5-nitroindazolinones, new prototypes of antichagasic drug
Chagas disease chemotherapy, currently based on only two drugs, nifurtimox and benznidazole, is far from satisfactory and therefore the development of new antichagasic compounds remains an important goal. On the basis of antichagasic properties previously described for some 1,2-disubstituted 5-nitroindazolin-3- ones (21, 33) and in order to initiate the optimization of activity of this kind of compounds, we have prepared a series of related analogs (22-32, 34-38, 58 and 59) and tested in vitro these products against epimastigote forms of Trypanosoma cruzi. 2-Benzyl-1-propyl (22), 2-benzyl-1-isopropyl (23) and 2-benzyl-1-butyl (24) derivatives have shown high trypanocidal activity and low un…
Predicting antitrichomonal activity: A computational screening using atom-based bilinear indices and experimental proofs
Existing Trichomonas vaginalis therapies are out of reach for most trichomoniasis people in developing countries and, where available, they are limited by their toxicity (mainly in pregnant women) and their cost. New antitrichomonal agents are needed to combat emerging metronidazole-resistant trichomoniasis and reduce the side effects associated with currently available drugs. Toward this end, atom-based bilinear indices, a new TOMOCOMD-CARDD molecular descriptor, and linear discriminant analysis (LDA) were used to discover novel, potent, and non-toxic lead trichomonacidal chemicals. Two discriminant functions were obtained with the use of non-stochastic and stochastic atom-type bilinear in…
Event-based criteria in GT-STAF information indices: theory, exploratory diversity analysis and QSPR applications
Versatile event-based approaches for the definition of novel information theory-based indices (IFIs) are presented. An event in this context is the criterion followed in the "discovery" of molecular substructures, which in turn serve as basis for the construction of the generalized incidence and relations frequency matrices, Q and F, respectively. From the resultant F, Shannon's, mutual, conditional and joint entropy-based IFIs are computed. In previous reports, an event named connected subgraphs was presented. The present study is an extension of this notion, in which we introduce other events, namely: terminal paths, vertex path incidence, quantum subgraphs, walks of length k, Sach's subg…
New ligand-based approach for the discovery of antitrypanosomal compounds.
The antitrypanosomal activity of 10 already synthesized compounds was in silico predicted as well as in vitro and in vivo explored against Trypanosoma cruzi. For the computational study, an approach based on non-stochastic linear fingerprints to the identification of potential antichagasic compounds is introduced. Molecular structures of 66 organic compounds, 28 with antitrypanosomal activity and 38 having other clinical uses, were parameterized by means of the TOMOCOMD-CARDD software. A linear classification function was derived allowing the discrimination between active and inactive compounds with a confidence of 95%. As predicted, seven compounds showed antitrypanosomal activity (%AE > 7…
Atom- and Bond-Based 2D TOMOCOMD-CARDD Approach and Ligand-Based Virtual Screening for the Drug Discovery of New Tyrosinase Inhibitors
Two-dimensional atom- and bond-based TOMOCOMD-CARDD descriptors and linear discriminant analysis (LDA) are used in this report to perform a quantitative structure-activity relationship (QSAR) study of tyrosinase-inhibitory activity. A database of inhibitors of the enzyme is collected for this study, within 246 highly dissimilar molecules presenting antityrosinase activity. In total, 7 discriminant functions are obtained by using the whole set of atom- and bond-based 2D indices. All the LDA-based QSAR models show accuracies above 90% in the training set and values of the Matthews correlation coefficient (C) varying from 0.85 to 0.90. The external validation set shows globally good classifica…
Nucleotide's bilinear indices: Novel bio-macromolecular descriptors for bioinformatics studies of nucleic acids. I. Prediction of paromomycin's affinity constant with HIV-1 Ψ-RNA packaging region
A new set of nucleotide-based bio-macromolecular descriptors are presented. This novel approach to bio-macromolecular design from a linear algebra point of view is relevant to nucleic acids quantitative structure-activity relationship (QSAR) studies. These bio-macromolecular indices are based on the calculus of bilinear maps on Re(n)[b(mk)(x (m),y (m)):Re(n) x Re(n)--Re] in canonical basis. Nucleic acid's bilinear indices are calculated from kth power of non-stochastic and stochastic nucleotide's graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth non-stochastic and stochastic nucleic acid's bilinear indices are calculated using M(m)(k)…
Tyrosinase Enzyme: 1. An Overview on a Pharmacological Target
The tyrosinase enzyme (EC 1.14.18.1) is an oxidoreductase inside the general enzyme classification and is involved in the oxidation and reduction process in the epidermis. These chemical reactions that the enzyme catalyzes are of principal importance in the melanogenesis process. This process of melanogenesis is related to the melanin formation, a heteropolymer of indolic nature that provides the different tonalities in the skin and helps to the protection from the ultraviolet radiation. However, a pigment overproduction, come up by the action of the tyrosinase, can cause different disorders in the skin related to the hyperpigmentation. Several studies mainly focused on the characteristics …
Comparative study to predict toxic modes of action of phenols from molecular structures.
Quantitative structure-activity relationship models for the prediction of mode of toxic action (MOA) of 221 phenols to the ciliated protozoan Tetrahymena pyriformis using atom-based quadratic indices are reported. The phenols represent a variety of MOAs including polar narcotics, weak acid respiratory uncouplers, pro-electrophiles and soft electrophiles. Linear discriminant analysis (LDA), and four machine learning techniques (ML), namely k-nearest neighbours (k-NN), support vector machine (SVM), classification trees (CTs) and artificial neural networks (ANNs), have been used to develop several models with higher accuracies and predictive capabilities for distinguishing between four MOAs. M…
MuLiMs-MCoMPAs: A Novel Multiplatform Framework to Compute Tensor Algebra-Based Three-Dimensional Protein Descriptors
This report introduces the MuLiMs-MCoMPAs software (acronym for Multi-Linear Maps based on N-Metric and Contact Matrices of 3D Protein and Amino-acid weightings), designed to compute tensor-based 3D protein structural descriptors by applying two- and three-linear algebraic forms. Moreover, these descriptors contemplate generalizing components such as novel 3D protein structural representations, (dis)similarity metrics, and multimetrics to extract geometrical related information between two and three amino acids, weighting schemes based on amino acid properties, matrix normalization procedures that consider simple-stochastic and mutual probability transformations, topological and geometrical…
Retrained Classification of Tyrosinase Inhibitors and “In Silico” Potency Estimation by Using Atom-Type Linear Indices
In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 fo…
Ligand-based discovery of novel trypanosomicidal drug-like compounds: In silico identification and experimental support
Abstract Two-dimensional bond-based linear indices and linear discriminant analysis are used in this report to perform a quantitative structure–activity relationship study to identify new trypanosomicidal compounds. A database with 143 anti-trypanosomal and 297 compounds having other clinical uses, are utilized to develop the theoretical models. The best discriminant models computed using bond-based linear indices provides accuracies greater than 90 for both training and test sets. Our models identify as anti-trypanosomals five out of nine compounds of a set of already-synthesized substances. The in vitro anti-trypanosomal activity of this set against epimastigote forms of Trypanosoma cruzi…
Non-stochastic quadratic fingerprints and LDA-based QSAR models in hit and lead generation through virtual screening: theoretical and experimental assessment of a promising method for the discovery of new antimalarial compounds
In order to explore the ability of non-stochastic quadratic indices to encode chemical information in antimalarials, four quantitative models for the discrimination of compounds having this property were generated and statistically compared. Accuracies of 90.2% and 83.3% for the training and test sets, respectively, were observed for the best of all the models, which included non-stochastic quadratic fingerprints weighted with Pauling electronegativities. With a comparative purpose and as a second validation experiment, an exercise of virtual screening of 65 already-reported antimalarials was carried out. Finally, 17 new compounds were classified as either active/inactive ones and experimen…
Bond-extended stochastic and nonstochastic bilinear indices. I. QSPR/QSAR applications to the description of properties/activities of small-medium size organic compounds
Bond-extended stochastic and nonstochastic bilinear indices are introduced in this article as novel bond-level molecular descriptors (MDs). These novel totals (whole-molecule) MDs are based on bilinear maps (forms) similar to use defined in linear algebra. The proposed nonstochastic indices try to match molecular structure provided by the molecular topology by using the kth Edge(Bond)-Adjacency Matrix (Ek, designed here as a nonstochastic E matrix). The stochastic parameters are computed by using the kth stochastic edge-adjacency matrix, ESk, as matrix operators of bilinear transformations. This new edge (bond)-adjacency relationship can be obtained directly from Ek and can be considered li…
Antiprotozoan lead discovery by aligning dry and wet screening: Prediction, synthesis, and biological assay of novel quinoxalinones
Protozoan parasites have been one of the most significant public health problems for centuries and several human infections caused by them have massive global impact. Most of the current drugs used to treat these illnesses have been used for decades and have many limitations such as the emergence of drug resistance, severe side-effects, low-to-medium drug efficacy, administration routes, cost, etc. These drugs have been largely neglected as models for drug development because they are majorly used in countries with limited resources and as a consequence with scarce marketing possibilities. Nowadays, there is a pressing need to identify and develop new drug-based antiprotozoan therapies. In …
Protein linear indices of the ‘macromolecular pseudograph α-carbon atom adjacency matrix’ in bioinformatics. Part 1: Prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor
Abstract A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein’s total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on R n [ f k ( x m i ) : R n → R n ] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph α-carbon atom adjacency matrix. Total linear indices are linear functional on R n . That is, the kth total linear indices are linear maps from R n to the scalar R [ f k …
New antitrichomonal drug-like chemicals selected by bond (edge)-based TOMOCOMD-CARDD descriptors.
Bond-based quadratic indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis (LDA) were used to discover novel lead trichomonacidals. The obtained LDA-based quantitative structure-activity relationships (QSAR) models, using nonstochastic and stochastic indices, were able to classify correctly 87.91% (87.50%) and 89.01% (84.38%) of the chemicals in training (test) sets, respectively. They showed large Matthews correlation coefficients of 0.75 (0.71) and 0.78 (0.65) for the training (test) sets, correspondingly. Later, both models were applied to the virtual screening of 21 chemicals to find new lead antitrichomonal agents. Predictions agreed with experimental resu…
Atom-based 3D-chiral quadratic indices. Part 2: prediction of the corticosteroid-binding globulinbinding affinity of the 31 benchmark steroids data set.
A quantitative structure-activity relationship (QSAR) study to predict the relative affinities of the steroid 'benchmark' data set to the corticosteroid-binding globulin (CBG) is described. It is shown that the 3D-chiral quadratic indices closely correlate with the measured CBG affinity values for the 31 steroids. The calculated descriptors were correlated with biological data through multiple linear regressions. Two statistically significant models were obtained when non-stochastic (R = 0.924 and s = 0.46) as well as stochastic (R = 0.929 and s = 0.46) 3D-chiral quadratic indices were used. A leave-one-out (LOO) approach to model validation is used here; the best results obtained in the cr…
Relations frequency hypermatrices in mutual, conditional and joint entropy-based information indices.
Graph-theoretic matrix representations constitute the most popular and significant source of topological molecular descriptors (MDs). Recently, we have introduced a novel matrix representation, named the duplex relations frequency matrix, F, derived from the generalization of an incidence matrix whose row entries are connected subgraphs of a given molecular graph G. Using this matrix, a series of information indices (IFIs) were proposed. In this report, an extension of F is presented, introducing for the first time the concept of a hypermatrix in graph-theoretic chemistry. The hypermatrix representation explores the n-tuple participation frequencies of vertices in a set of connected subgrap…
Atom-Based Quadratic Indices to Predict Aquatic Toxicity of Benzene Derivatives to <i>Tetrahymena pyriformis</i>
The non-stochastic and stochastic atom-based quadratic indices are applied to develop quantitative structure-activity relationship (QSAR) models for the prediction of aquatic toxicity. The used dataset, consisting of 392 benzene derivatives for which toxicity data to the ciliate Tetrahymena pyriformis were available, is divided into training and test sets. The obtained multiple linear regression models are statistically significant (R2 = 0.787 and s = 0.347, R2 = 0.806 and s = 0.329, for non-stochastic and stochastic quadratic indices, respectively) and show rather good stability in a cross-validation experiment (q2 = 0.769 and scv = 0.357, q2 = 0.791 and scv = 0.337, correspondingly). In a…
tomocomd-camps and protein bilinear indices - novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor
Descriptors calculated from a specific representation scheme encode only one part of the chemical information. For this reason, there is a need to construct novel graphical representations of proteins and novel protein descriptors that can provide new information about the structure of proteins. Here, a new set of protein descriptors based on computation of bilinear maps is presented. This novel approach to biomacromolecular design is relevant for QSPR studies on proteins. Protein bilinear indices are calculated from the kth power of nonstochastic and stochastic graph–theoretic electronic-contact matrices, and , respectively. That is to say, the kth nonstochastic and stochastic protein bili…
Global stability of protein folding from an empirical free energy function
The principles governing protein folding stand as one of the biggest challenges of Biophysics. Modeling the global stability of proteins and predicting their tertiary structure are hard tasks, due in part to the variety and large number of forces involved and the difficulties to describe them with sufficient accuracy. We have developed a fast, physics-based empirical potential, intended to be used in global structure prediction methods. This model considers four main contributions: Two entropic factors, the hydrophobic effect and configurational entropy, and two terms resulting from a decomposition of close-packing interactions, namely the balance of the dispersive interactions of folded an…
Prediction of tyrosinase inhibition activity using atom-based bilinear indices.
A set of novel atom-based molecular fingerprints is proposed based on a bilinear map similar to that defined in linear algebra. These molecular descriptors (MDs) are proposed as a new means of molecular parametrization easily calculated from 2D molecular information. The nonstochastic and stochastic molecular indices match molecular structure provided by molecular topology by using the kth nonstochastic and stochastic graph-theoretical electronic-density matrices, M(k) and S(k), respectively. Thus, the kth nonstochastic and stochastic bilinear indices are calculated using M(k) and S(k) as matrix operators of bilinear transformations. Chemical information is coded by using different pair com…
QSAR models for tyrosinase inhibitory activity description applying modern statistical classification techniques: A comparative study
Abstract Cluster analysis (CA), Linear and Quadratic Discriminant Analysis (L(Q)DA), Binary Logistic Regression (BLR) and Classification Tree (CT) are applied on two datasets for description of tyrosinase inhibitory activity from molecular structures. The first set included 701 tyrosinase inhibitors (TI) that are used for performance of inhibitory and non-inhibitory activity and the second one is for potency estimation of active compounds. 2D TOMOCOMD-CARDD atom-based quadratic indices are computed as molecular descriptors. CA is used to “rational” design of training (TS) and prediction set (PS) but it shows of not being adequate as classification technique. On the first data, the overall a…
Computational discovery of novel trypanosomicidal drug-like chemicals by using bond-based non-stochastic and stochastic quadratic maps and linear discriminant analysis.
Herein we present results of a quantitative structure-activity relationship (QSAR) studies to classify and design, in a rational way, new antitrypanosomal compounds by using non-stochastic and stochastic bond-based quadratic indices. A data set of 440 organic chemicals, 143 with antitrypanosomal activity and 297 having other clinical uses, is used to develop QSAR models based on linear discriminant analysis (LDA). Non-stochastic model correctly classifies more than 93% and 95% of chemicals in both training and external prediction groups, respectively. On the other hand, the stochastic model shows an accuracy of about the 87% for both series. As an experiment of virtual lead generation, the …
A Hooke's law-based approach to protein folding rate
Kinetics is a key aspect of the renowned protein folding problem. Here, we propose a comprehensive approach to folding kinetics where a polypeptide chain is assumed to behave as an elastic material described by the Hooke[U+05F3]s law. A novel parameter called elastic-folding constant results from our model and is suggested to distinguish between protein with two-state and multi-state folding pathways. A contact-free descriptor, named folding degree, is introduced as a suitable structural feature to study protein-folding kinetics. This approach generalizes the observed correlations between varieties of structural descriptors with the folding rate constant. Additionally several comparisons am…
Atom, atom-type, and total nonstochastic and stochastic quadratic fingerprints: a promising approach for modeling of antibacterial activity.
The TOpological MOlecular COMputer Design (TOMOCOMD-CARDD) approach has been introduced for the classification and design of antimicrobial agents using computer-aided molecular design. For this propose, atom, atom-type, and total quadratic indices have been generalized to codify chemical structure information. In this sense, stochastic quadratic indices have been introduced for the description of the molecular structure. These stochastic fingerprints are based on a simple model for the intramolecular movement of all valence-bond electrons. In this work, a complete data set containing 1006 antimicrobial agents is collected and presented. Two structure-based antibacterial activity classificat…
Bond-based bilinear indices for computational discovery of novel trypanosomicidal drug-like compounds through virtual screening
Two-dimensional bond-based bilinear indices and linear discriminant analysis are used in this report to perform a quantitative structure-activity relationship study to identify new trypanosomicidal compounds. A data set of 440 organic chemicals, 143 with antitrypanosomal activity and 297 having other clinical uses, is used to develop the theoretical models. Two discriminant models, computed using bond-based bilinear indices, are developed and both show accuracies higher than 86% for training and test sets. The stochastic model correctly indentifies nine out of eleven compounds of a set of organic chemicals obtained from our synthetic collaborators. The in vitro antitrypanosomal activity of …
Multi-output Model with Box-Jenkins Operators of Quadratic Indices for Prediction of Malaria and Cancer Inhibitors Targeting Ubiquitin- Proteasome Pathway (UPP) Proteins.
The ubiquitin-proteasome pathway (UPP) is the primary degradation system of short-lived regulatory proteins. Cellular processes such as the cell cycle, signal transduction, gene expression, DNA repair and apoptosis are regulated by this UPP and dysfunctions in this system have important implications in the development of cancer, neurodegenerative, cardiac and other human pathologies. UPP seems also to be very important in the function of eukaryote cells of the human parasites like Plasmodium falciparum, the causal agent of the neglected disease Malaria. Hence, the UPP could be considered as an attractive target for the development of compounds with Anti-Malarial or Anti-cancer properties. R…
Extending Graph (Discrete) Derivative Descriptors to N-Tuple Atom-Relations
In the present manuscript, an extension of the previously defined Graph Derivative Indices (GDIs) is discussed. To achieve this objective, the concept of a hypermatrix, conceived from the calculation of the frequencies of triple and quadruple atom relations in a set of connected sub-graphs, is introduced. This set of subgraphs is generated following a predefined criterion, known as the event (S), being in this particular case the connectivity among atoms. The triple and quadruple relations frequency matrices serve as a basis for the computation of triple and quadruple discrete derivative indices, respectively. The GDIs are implemented in a computational program denominated DIVATI (acronym f…
Extended GT-STAF information indices based on Markov approximation models
Abstract A series of novel information theory-based molecular parameters derived from the insight of a molecular structure as a chemical communication system were recently presented and usefully employed in QSAR/QSPRs (J. Comp. Chem, 2013, 34, 259; SAR and QSAR in Environ. Res. 2013, 24). This approach permitted the application of Shannon’s source and channel coding entropic measures to a chemical information source comprised of molecular ‘fragments’, using the zero-order Markov approximation model (atom-based approach). This report covers the theoretical aspects of the extensions of this approach to higher-order models, introducing the first, second and generalized-order Markov approximati…
QuBiLS-MIDAS: A parallel free-software for molecular descriptors computation based on multilinear algebraic maps
The present report introduces the QuBiLS-MIDAS software belonging to the ToMoCoMD-CARDD suite for the calculation of three-dimensional molecular descriptors (MDs) based on the two-linear (bilinear), three-linear, and four-linear (multilinear or N-linear) algebraic forms. Thus, it is unique software that computes these tensor-based indices. These descriptors, establish relations for two, three, and four atoms by using several (dis-)similarity metrics or multimetrics, matrix transformations, cutoffs, local calculations and aggregation operators. The theoretical background of these N-linear indices is also presented. The QuBiLS-MIDAS software was developed in the Java programming language and …
Atom-based non-stochastic and stochastic bilinear indices: Application to QSPR/QSAR studies of organic compounds
The recently introduced bilinear indices are applied to the QSAR/QSPR studies of heteroatomic molecules. These novel atom-based molecular fingerprints are used to predict the boiling point of 28 alkyl-alcohols and partition coefficient, specific rate constant and antibacterial activity of 34 2-furylethylenes derivatives. The obtained models are statistically significant and show rather very good stability in a cross-validation experiment. The comparison with other approaches exposes a good behavior of our method in this QSPR studies. The obtained results suggest that with the present method, it is possible to obtain a good estimation of physical, chemical and physicochemical properties for …
Discovery of novel anti-inflammatory drug-like compounds by aligning in silico and in vivo screening: The nitroindazolinone chemotype
In this report, we propose the combination of computational methods and in vivo primary screening in zebrafish larvae and confirmatory in mice models as a novel strategy to accelerate anti-inflammatory drug discovery. Initially, a database of 1213 organic chemicals with great structural variability - 587 of them anti-inflammatory agents plus 626 compounds with other clinical uses - was divided into training and test groups. Atom-based quadratic indices - a TOMOCOMD-CARDD molecular descriptors family - and linear discriminant analysis (LDA) were used to develop a total of 13 models to describe the anti-inflammatory activity. The best model (Eq. (13)) shows an accuracy of 87.70% in the traini…
3D-Chiral quadratic indices of the ‘molecular pseudograph’s atom adjacency matrix’ and their application to central chirality codification: classification of ACE inhibitors and prediction of σ-receptor antagonist activities
Quadratic indices of the 'molecular pseudograph's atom adjacency matrix' have been generalized to codify chemical structure information for chiral drugs. These 3D-chiral quadratic indices make use of a trigonometric 3D-chirality correction factor. These indices are nonsymmetric and reduced to classical (2D) descriptors when symmetry is not codified. By this reason, it is expected that they will be useful to predict symmetry-dependent properties. 3D-Chirality quadratic indices are real numbers and thus, can be easily calculated in TOMOCOMD-CARDD software. These descriptors circumvent the inability of conventional 2D quadratic indices (Molecules 2003, 8, 687-726. http://www.mdpi.org) and othe…
LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs
Abstract Novel 3D protein descriptors based on bilinear, quadratic and linear algebraic maps in R n are proposed. The latter employs the kth 2-tuple (dis) similarity matrix to codify information related to covalent and non-covalent interactions in these biopolymers. The calculation of the inter-amino acid distances is generalized by using several dis-similarity coefficients, where normalization procedures based on the simple stochastic and mutual probability schemes are applied. A new local-fragment approach based on amino acid-types and amino acid-groups is proposed to characterize regions of interest in proteins. Topological and geometric macromolecular cutoffs are defined using local and…
Identification In Silico and In Vitro of Novel Trypanosomicidal Drug-Like Compounds
Atom-based bilinear indices and linear discriminant analysis are used to discover novel trypanosomicidal compounds. The obtained linear discriminant analysis-based quantitative structure–activity relationship models, using non-stochastic and stochastic indices, provide accuracies of 89.02% (85.11%) and 89.60% (88.30%) of the chemicals in the training (test) sets, respectively. Later, both models were applied to the virtual screening of 18 in-house synthesized compounds to find new pro-lead antitrypanosomal agents. The in vitro antitrypanosomal activity of this set against epimastigote forms of Trypanosoma cruzi is assayed. Predictions agree with experimental results to a great extent (16/18…
Chemometric and chemoinformatic analyses of anabolic and androgenic activities of testosterone and dihydrotestosterone analogues
Predictive quantitative structure-activity relationship (QSAR) models of anabolic and androgenic activities for the testosterone and dihydrotestosterone steroid analogues were obtained by means of multiple linear regression using quantum and physicochemical molecular descriptors (MD) as well as a genetic algorithm for the selection of the best subset of variables. Quantitative models found for describing the anabolic (androgenic) activity are significant from a statistical point of view: R2 of 0.84 (0.72 and 0.70). A leave-one-out cross-validation procedure revealed that the regression models had a fairly good predictability [q2 of 0.80 (0.60 and 0.59)]. In addition, other QSAR models were …
QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations.
Background In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials…
Identification <i>In Silico</i> and <i>In Vitro</i> of Novel Trypanosomicidal Drug-like Compounds
Atom-based bilinear indices and linear discriminant analysis are used to discover novel trypanosomicidal compounds. The obtained linear discriminant analysis-based quantitative structure–activity relationship models, using non-stochastic and stochastic indices, provide accuracies of 89.02% (85.11%) and 89.60% (88.30%) of the chemicals in the training (test) sets, respectively. Later, both models were applied to the virtual screening of 18 in-house synthesized compounds to find new pro-lead antitrypanosomal agents. The in vitro antitrypanosomal activity of this set against epimastigote forms of Trypanosoma cruzi is assayed. Predictions agree with experimental results to a great extent (16/18…
New tyrosinase inhibitors selected by atomic linear indices-based classification models.
In the present report, the use of the atom-based linear indices for finding functions that discriminate between the tyrosinase inhibitor compounds and inactive ones is presented. In this sense, discriminant models were applied and globally good classifications of 93.51% and 92.46% were observed for non-stochastic and stochastic linear indices best models, respectively, in the training set. The external prediction sets had accuracies of 91.67% and 89.44%. In addition, these fitted models were used in the screening of new cycloartane compounds isolated from herbal plants. A good behavior is shown between the theoretical and experimental results. These results provide a tool that can be used i…
Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences
Abstract Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are inc…
Dry selection and wet evaluation for the rational discovery of new anthelmintics
Helminths infections remain a major problem in medical and public health. In this report, atom-based 2D bilinear indices, a TOMOCOMD-CARDD (QuBiLs-MAS module) molecular descriptor family and linear discriminant analysis (LDA) were used to find models that differentiate among anthelmintic and non-anthelmintic compounds. Two classification models obtained by using non-stochastic and stochastic 2D bilinear indices, classified correctly 86.64% and 84.66%, respectively, in the training set. Equation 1(2) correctly classified 141(135) out of 165 [85.45%(81.82%)] compounds in external validation set. Another LDA models were performed in order to get the most likely mechanism of action of anthelmin…
MOESM1 of QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations
Additional file 1. The mathematical definitions of the norms, means and statistical invariants as generalizations of the linear combination of LOVIs as global (and/or local) MDs aggregation operator, as well as classical algorithms which generalize the first three groups are presented as Figure SI1-Table S12. The UML diagram (Figure SI3), a debug report file content (Figure SI4), a batch process manager dialog window (Figure SI5) are also listed. Some results of the factor analysis by the principal component method are shown as Table SI6-Table SI8, and finally, the names of structures for Cramer’s steroid database and their corresponding values for the binding affinity to the corticosteroid…
Generalized Molecular Descriptors Derived From Event-Based Discrete Derivative.
In the present study, a generalized approach for molecular structure characterization is introduced, based on the relation frequency matrix (F) representation of the molecular graph and the subsequent calculation of the corresponding discrete derivative (finite difference) over a pair of elements (atoms). In earlier publications (22- 24), an unique event, named connected subgraphs, (based on the Kier-Hall's subgraphs) was systematically employed for the computation of the matrix F. The present report is a generalization of this notion, in which eleven additional events are introduced, classified in three categories, namely, topological (terminal paths, vertex path incidence, quantum subgrap…
Discovery of novel trichomonacidals using LDA-driven QSAR models and bond-based bilinear indices as molecular descriptors
Few years ago, the World Health Organization estimated the number of adults with trichomoniasis at 170 million worldwide, more than the combined numbers for gonorrhea, syphilis, and chlamydia. To combat this sexually transmitted disease, Metronidazole (MTZ) has emerged, since 1959, as a powerful drug for the systematic treatment of infected patients. However, increasing resistance to MTZ, adverse effects associated to high-dose MTZ therapies and very expensive conventional technologies related to the development of new trichomonacidals necessitate novel computational methods that shorten the drug discovery pipeline. Therefore, bond-based bilinear indices, new 2-D bond-based TOMOCOMD-CARDD M…
Estimation of ADME Properties in Drug Discovery: Predicting Caco-2 Cell Permeability Using Atom-Based Stochastic and Non-stochastic Linear Indices
The in vitro determination of the permeability through cultured Caco-2 cells is the most often-used in vitro model for drug absorption. In this report, we use the largest data set of measured P(Caco-2), consisting of 157 structurally diverse compounds. Linear discriminant analysis (LDA) was used to obtain quantitative models that discriminate higher absorption compounds from those with moderate-poorer absorption. The best LDA model has an accuracy of 90.58% and 84.21% for training and test set. The percentage of good correlation, in the virtual screening of 241 drugs with the reported values of the percentage of human intestinal absorption (HIA), was greater than 81%. In addition, multiple …
QSPR/QSAR Studies of 2-Furylethylenes Using Bond-Level Quadratic Indices and Comparison with Other Computational Approaches
The recently introduced, non-stochastic and stochastic quadratic indices (Marrero-Ponce <em>et al. J. Comp. Aided Mol. Des.</em> 2006, 20, 685-701) were applied to QSAR/QSPR studies of heteroatomic molecules. These novel bond-based molecular descriptors (MDs) were used for the prediction of the partition coefficient (log P), and the antibacterial activity of 34 derivatives of 2-furylethylenes. Two statistically significant QSPR models using non-stochastic and stochastic bond-based quadratic indices were obtained (R<sup>2</sup> = 0.971, s = 0.137 and R<sup>2</sup> = 0.986, s = 0.096). These models showed good stability to data variation in leave-one-out (L…
Anti-Inflammatory Activity and Cheminformatics Analysis of New Poten t 2-Substituted 1-Methyl-5-Nitroindazolinones.
After the identification of the anti-inflammatory properties of VA5-13l (2-benzyl-1- methyl-5-nitroindazolinone) in previous investigations, some of its analogous compounds were designed, synthesized and evaluated in two anti-inflammatory methods: LPS-enhanced leukocyte migration assay in zebrafish; and 12-O-tetradecanoylphorbol-13-acetate (TPA)-induced mouse ear edema. The products evaluated (3, 6, 8, 9 and 10) showed the lower values of relative leukocyte migration at 30#181;M (0.14, 0.07, 0.10, 0.13 and 0.07, respectively), while in ear edema and myeloperoxidase activity methods, all the compounds reduced inflammation, only 4 and 16 yielded unsatisfactory results. The relationship linkin…
State of the Art Review and Report of New Tool for Drug Discovery
BACKGROUND There are a great number of tools that can be used in QSAR/QSPR studies; they are implemented in several programs that are reviewed in this report. The usefulness of new tools can be proved through comparison, with previously published approaches. In order to perform the comparison, the most usual is the use of several benchmark datasets such as DRAGON and Sutherland's datasets. METHODS Here, an exploratory study of Atomic Weighted Vectors (AWVs), a new tool useful for drug discovery using different datasets, is presented. In order to evaluate the performance of the new tool, several statistics and QSAR/QSPR experiments are performed. Variability analyses are used to quantify the…
Dragon method for finding novel tyrosinase inhibitors: Biosilico identification and experimental in vitro assays
QSAR (quantitative structure-activity relationship) studies of tyrosinase inhibitors employing Dragon descriptors and linear discriminant analysis (LDA) are presented here. A data set of 653 compounds, 245 with tyrosinase inhibitory activity and 408 having other clinical uses were used. The active data set was processed by k-means cluster analysis in order to design training and prediction series. Seven LDA-based QSAR models were obtained. The discriminant functions applied showed a globally good classification of 99.79% for the best model Class=-96.067+1.988 x 10(2)X0Av +9 1.907 BIC3 + 6.853 CIC1 in the training set. External validation processes to assess the robustness and predictive pow…
Vanilloid Derivatives as Tyrosinase Inhibitors Driven by Virtual Screening-Based QSAR Models
A number of vanilloids have been tested as tyrosinase inhibitors using Ligand-Based Virtual Screening (LBVS) driven by QSAR (Quantitative Structure-Activity Relationship) models as the multi-agent classification system. A total of 81 models were used to screen this family. Then, a preliminary cluster analysis of the selected chemicals was carried out based on their bioactivity to detect possible similar substructural features among these compounds and the active database used in the QSAR model construction. The compounds identified were tested in vitro to corroborate the results obtained in silico. Among them, two chemicals, isovanillin (K(M) (app) = 1.08 mM) near to kojic acid (reference d…
A novel approach to predict aquatic toxicity from molecular structure
The main aim of the study was to develop quantitative structure-activity relationship (QSAR) models for the prediction of aquatic toxicity using atom-based non-stochastic and stochastic linear indices. The used dataset consist of 392 benzene derivatives, separated into training and test sets, for which toxicity data to the ciliate Tetrahymena pyriformis were available. Using multiple linear regression, two statistically significant QSAR models were obtained with non-stochastic (R2=0.791 and s=0.344) and stochastic (R2=0.799 and s=0.343) linear indices. A leave-one-out (LOO) cross-validation procedure was carried out achieving values of q2=0.781 (scv=0.348) and q2=0.786 (scv=0.350), respecti…
In silicoAntibacterial Activity Modeling Based on the TOMOCOMD-CARDD Approach
In the recent times, the race to cope with the increasing multidrug resistance of pathogenic bacteria has lost much of its momentum and health professionals are grasping for solutions to deal with the unprecedented resistance levels. As a result, there is an urgent need for a concerted effort towards the development of new antimicrobial drugs to stay ahead in the fight against the ever adapting bacteria. In the present report, antibacterial classification functions (models) based on the topological molecular computational design-computer aided >rational> drug design (TOMOCOMD-CARDD) atom-based non-stochastic and stochastic bilinear indices are presented. These models were built using the li…
A Comparative Study of Nonlinear Machine Learning for the "In Silico" Depiction of Tyrosinase Inhibitory Activity from Molecular Structure.
In the preset report, for the first time, support vector machine (SVM), artificial neural network (ANN), Baye- sian networks (BNs), k-nearest neighbor (k-NN) are applied and compared on two "in-house" datasets to describe the tyrosinase inhibitory activity from the molecular structure. The data set Data I is used for the identification of tyrosi- nase inhibitors (TIs) including 701 active and 728 inactive compounds. Data II consists of active chemicals for potency estimation of TIs. The 2D TOMOCOMD-CARDD atom-based quadratic indices are used as molecular descriptors. The de- rived models show rather encouraging results with the areas under the Receiver Operating Characteristic (AURC) curve …
Atom-based Stochastic and non-Stochastic 3D-Chiral Bilinear Indices and their Applications to Central Chirality Codification
Abstract Non-stochastic and stochastic 2D bilinear indices have been generalized to codify chemical structure information for chiral drugs, making use of a trigonometric 3D-chirality correction factor. In order to evaluate the effectiveness of this novel approach in drug design we have modeled the angiotensin-converting enzyme inhibitory activity of perindoprilate's σ-stereoisomers combinatorial library. Two linear discriminant analysis models, using non-stochastic and stochastic linear indices, were obtained. The models had shown an accuracy of 95.65% for the training set and 100% for the external prediction set. Next the prediction of the σ-receptor antagonists of chiral 3-(3-hydroxypheny…
QuBiLs-MAS method in early drug discovery and rational drug identification of antifungal agents
The QuBiLs-MAS approach is used for the in silico modelling of the antifungal activity of organic molecules. To this effect, non-stochastic (NS) and simple-stochastic (SS) atom-based quadratic indices are used to codify chemical information for a comprehensive dataset of 2478 compounds having a great structural variability, with 1087 of them being antifungal agents, covering the broadest antifungal mechanisms of action known so far. The NS and SS index-based antifungal activity classification models obtained using linear discriminant analysis (LDA) yield correct classification percentages of 90.73% and 92.47%, respectively, for the training set. Additionally, these models are able to correc…
<strong>New tool useful for drug discovery validated through benchmark datasets</strong>
Atomic Weighted Vectors (AWVs) are vectors that contain the codified information of molecular structures, which can apply to a set of Aggregation Operators (AOs) to calculate total and local molecular descriptors (MDs). This article presents an exploratory study of a new tool useful for drug discovery using different datasets, such as DRAGON and Sutherland’s datasets, as well as their comparison with other well-known approaches. In order to evaluate the performance of the tool, several statistics and QSAR/QSPR experiments were performed. Variability analyses are used to quantify the information content of the AWVs obtained from the tool, by the way of an information theory-based algorithm. …
Bond-based 3D-chiral linear indices: Theory and QSAR applications to central chirality codification
The recently introduced non-stochastic and stochastic bond-based linear indices are been generalized to codify chemical structure information for chiral drugs, making use of a trigonometric 3D-chirality correction factor. These improved modified descriptors are applied to several well-known data sets to validate each one of them. Particularly, Cramer's steroid data set has become a benchmark for the assessment of novel quantitative structure activity relationship methods. This data set has been used by several researchers using 3D-QSAR approaches such as Comparative Molecular Field Analysis, Molecular Quantum Similarity Measures, Comparative Molecular Moment Analysis, E-state, Mapping Prope…
Smoothed Spherical Truncation based on Fuzzy Membership Functions: Application to the Molecular Encoding.
A novel spherical truncation method, based on fuzzy membership functions, is introduced to truncate interatomic (or interaminoacid) relations according to smoothing values computed from fuzzy membership degrees. In this method, the molecules are circumscribed into a sphere, so that the geometric centers of the molecules are the centers of the spheres. The fuzzy membership degree of each atom (or aminoacid) is computed from its distance with respect to the geometric center of the molecule, by using a fuzzy membership function. So, the smoothing value to be applied in the truncation of a relation (or interaction) is computed by averaging the fuzzy membership degrees of the atoms (or aminoacid…
Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragme…
Bond-Based 2D Quadratic Fingerprints in QSAR Studies: Virtual and In vitro Tyrosinase Inhibitory Activity Elucidation
In this report, we show the results of quantitative structure–activity relationship (QSAR) studies of tyrosinase inhibitory activity, by using the bond-based quadratic indices as molecular descriptors (MDs) and linear discriminant analysis (LDA), to generate discriminant functions to predict the anti-tyrosinase activity. The best two models [Eqs (6) and (12)] out of the total 12 QSAR models developed here show accuracies of 93.51% and 91.21%, as well as high Matthews correlation coefficients (C) of 0.86 and 0.82, respectively, in the training set. The validation external series depicts values of 90.00% and 89.44% for these best two equations (6) and (12), respectively. Afterwards, a second …
Applying pattern recognition methods plus quantum and physico-chemical molecular descriptors to analyze the anabolic activity of structurally diverse steroids.
The great cost associated with the development of new anabolic-androgenic steroid (AASs) makes necessary the development of computational methods that shorten the drug discovery pipeline. Toward this end, quantum, and physicochemical molecular descriptors, plus linear discriminant analysis (LDA) were used to analyze the anabolic/androgenic activity of structurally diverse steroids and to discover novel AASs, as well as also to give a structural interpretation of their anabolic-androgenic ratio (AAR). The obtained models are able to correctly classify 91.67% (86.27%) of the AASs in the training (test) sets, respectively. The results of predictions on the 10% full-out cross-validation test al…
Atom, atom-type and total molecular linear indices as a promising approach for bioorganic and medicinal chemistry: theoretical and experimental assessment of a novel method for virtual screening and rational design of new lead anthelmintic.
Abstract Helminth infections are a medical problem in the world nowadays. In this paper a novel atom-level chemical descriptor has been applied to estimate the anthelmintic activity. Total and local linear indices and linear discriminant analysis were used to obtain a quantitative model that discriminates between anthelmintic and non-anthelmintic drug-like compounds. The discriminant model has an accuracy of 90.11% in the training set, with a high Matthews’ correlation coefficient (MCC = 0.80). To assess the robustness and predictive power of the obtained model, internal (leave-n-out) and external validation process was performed. The QSAR model correctly classified 88.55% of compounds in t…
Data for: LEGO-based Generalized Set of Two Linear Algebraic 3D Bio-Macro-Molecular Descriptors: Theory and Validation by QSARs
SI3-1: 15 suggested theoretical configurations for the calculation of MDs (defined with the name projects). The selected configuration for the projects used in this study are also indicated in Table SI2-1 and are available at SI3-1. SI3-2: The experiments employed a dataset containing 152 representatives, non-homologous proteins (see SI3-2 to review the protein files). (Fleming and Richards, 2000). SI3-3: The evaluation of this application in protein science requires the use of two datasets. The first data set, employed as a training set, was proposed by Ouyang (Ouyang and Liang, 2008) and contains 80 proteins (the case “2BLM” was removed since it only considered an alpha carbon representat…
In silico Antibacterial Activity Modeling Based on the TOMOCOMD-CARDD Approach
In the recent times, the race to cope with the increasing multidrug resistance of pathogenic bacteria has lost much of its momentum and health professionals are grasping for solutions to deal with the unprecedented resistance levels. As a result, there is an urgent need for a concerted effort towards the development of new antimicrobial drugs to stay ahead in the fight against the ever adapting bacteria. In the present report, antibacterial classification functions (models) based on the topological molecular computational design-computer aided ‘‘rational’’ drug design (TOMOCOMD-CARDD) atom-based non-stochastic and stochastic bilinear indices are presented. These models were built using the …