0000000000178711
AUTHOR
Stephen J. Barigye
<strong>Predicting Proteasome Inhibition using Atomic Weighted Vector and Machine Learning</strong>
Ubiquitin/Proteasome System (UPS) is a highly regulated mechanism of intracellular protein degradation and turnover. Through the concerted actions of a series of enzymes, proteins are marked for proteasomal degradation by being linked to the polypeptide co-factor, ubiquitin. The UPS participates in a wide array of biological functions such as antigen presentation, regulation of gene transcription and the cell cycle, and activation of NF-κB. Some researchers have applied QSAR method and machine learning in the study of proteasome inhibition (EC50(µmol/L)), such as: the analysis of proteasome inhibition prediction, in the prediction of multi-target inhibitors of UPP and in the prediction of p…
Discrete Derivatives for Atom-Pairs as a Novel Graph-Theoretical Invariant for Generating New Molecular Descriptors: Orthogonality, Interpretation and QSARs/QSPRs on Benchmark Databases.
This report presents a new mathematical method based on the concept of the derivative of a molecular graph (G) with respect to a given event (S) to codify chemical structure information. The derivate over each pair of atoms in the molecule is defined as ∂G/∂S(vi , vj )=(fi -2fij +fj )/fij , where fi (or fj ) and fij are the individual frequency of atom i (or j) and the reciprocal frequency of the atoms i and j, respectively. These frequencies characterize the participation intensity of atom pairs in S. Here, the event space is composed of molecular sub-graphs which participate in the formation of the G skeleton that could be complete (representing all possible connected sub-graphs) or comp…
Event-based criteria in GT-STAF information indices: theory, exploratory diversity analysis and QSPR applications
Versatile event-based approaches for the definition of novel information theory-based indices (IFIs) are presented. An event in this context is the criterion followed in the "discovery" of molecular substructures, which in turn serve as basis for the construction of the generalized incidence and relations frequency matrices, Q and F, respectively. From the resultant F, Shannon's, mutual, conditional and joint entropy-based IFIs are computed. In previous reports, an event named connected subgraphs was presented. The present study is an extension of this notion, in which we introduce other events, namely: terminal paths, vertex path incidence, quantum subgraphs, walks of length k, Sach's subg…
Predictive modeling of aryl hydrocarbon receptor (AhR) agonism
Abstract The aryl hydrocarbon receptor (AhR) plays a key role in the regulation of gene expression in metabolic machinery and detoxification systems. In the recent years, this receptor has attracted interest as a therapeutic target for immunological, oncogenic and inflammatory conditions. In the present report, in silico and in vitro approaches were combined to study the activation of the AhR. To this end, a large database of chemical compounds with known AhR agonistic activity was employed to build 5 classifiers based on the Adaboost (AdB), Gradient Boosting (GB), Random Forest (RF), Multilayer Perceptron (MLP) and Support Vector Machine (SVM) algorithms, respectively. The built classifier…
Antiprotozoan lead discovery by aligning dry and wet screening: Prediction, synthesis, and biological assay of novel quinoxalinones
Protozoan parasites have been one of the most significant public health problems for centuries and several human infections caused by them have massive global impact. Most of the current drugs used to treat these illnesses have been used for decades and have many limitations such as the emergence of drug resistance, severe side-effects, low-to-medium drug efficacy, administration routes, cost, etc. These drugs have been largely neglected as models for drug development because they are majorly used in countries with limited resources and as a consequence with scarce marketing possibilities. Nowadays, there is a pressing need to identify and develop new drug-based antiprotozoan therapies. In …
Prediction of Aquatic Toxicity of Benzene Derivatives to Tetrahymena pyriformis According to OECD Principles
Background: Many QSAR studies have been developed to predict acute toxicity over several biomarkers like Pimephales promelas, Daphnia magna and Tetrahymena pyriformis. Regardless of the progress made in this field there are still some gaps to be resolved such as the prediction of aquatic toxicity over the protozoan T. pyriformis still lack a QSAR study focused in accomplish the OECD principles. Methods: Atom-based quadratic indices are used to obtain quantitative structure-activity relationship (QSAR) models for the prediction of aquatic toxicity. Our models agree with the principles required by the OECD for QSAR models to regulatory purposes. The database employed consists of 392 substitut…
Relations frequency hypermatrices in mutual, conditional and joint entropy-based information indices.
Graph-theoretic matrix representations constitute the most popular and significant source of topological molecular descriptors (MDs). Recently, we have introduced a novel matrix representation, named the duplex relations frequency matrix, F, derived from the generalization of an incidence matrix whose row entries are connected subgraphs of a given molecular graph G. Using this matrix, a series of information indices (IFIs) were proposed. In this report, an extension of F is presented, introducing for the first time the concept of a hypermatrix in graph-theoretic chemistry. The hypermatrix representation explores the n-tuple participation frequencies of vertices in a set of connected subgrap…
Extending Graph (Discrete) Derivative Descriptors to N-Tuple Atom-Relations
In the present manuscript, an extension of the previously defined Graph Derivative Indices (GDIs) is discussed. To achieve this objective, the concept of a hypermatrix, conceived from the calculation of the frequencies of triple and quadruple atom relations in a set of connected sub-graphs, is introduced. This set of subgraphs is generated following a predefined criterion, known as the event (S), being in this particular case the connectivity among atoms. The triple and quadruple relations frequency matrices serve as a basis for the computation of triple and quadruple discrete derivative indices, respectively. The GDIs are implemented in a computational program denominated DIVATI (acronym f…
Extended GT-STAF information indices based on Markov approximation models
Abstract A series of novel information theory-based molecular parameters derived from the insight of a molecular structure as a chemical communication system were recently presented and usefully employed in QSAR/QSPRs (J. Comp. Chem, 2013, 34, 259; SAR and QSAR in Environ. Res. 2013, 24). This approach permitted the application of Shannon’s source and channel coding entropic measures to a chemical information source comprised of molecular ‘fragments’, using the zero-order Markov approximation model (atom-based approach). This report covers the theoretical aspects of the extensions of this approach to higher-order models, introducing the first, second and generalized-order Markov approximati…
QuBiLS-MIDAS: A parallel free-software for molecular descriptors computation based on multilinear algebraic maps
The present report introduces the QuBiLS-MIDAS software belonging to the ToMoCoMD-CARDD suite for the calculation of three-dimensional molecular descriptors (MDs) based on the two-linear (bilinear), three-linear, and four-linear (multilinear or N-linear) algebraic forms. Thus, it is unique software that computes these tensor-based indices. These descriptors, establish relations for two, three, and four atoms by using several (dis-)similarity metrics or multimetrics, matrix transformations, cutoffs, local calculations and aggregation operators. The theoretical background of these N-linear indices is also presented. The QuBiLS-MIDAS software was developed in the Java programming language and …
Computational identification of chemical compounds with potential anti-Chagas activity using a classification tree
Chagas disease is endemic to 21 Latin American countries and is a great public health problem in that region. Current chemotherapy remains unsatisfactory; consequently the need to search for new drugs persists. Here we present a new approach to identify novel compounds with potential anti-chagasic action. A large dataset of 584 compounds, obtained from the Drugs for Neglected Diseases initiative, was selected to develop the computational model. Dragon software was used to calculate the molecular descriptors and WEKA software to obtain the classification tree. The best model shows accuracy greater than 93.4% for the training set; the tree was also validated using a 10-fold cross-validation p…
QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations.
Background In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials…
Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences
Abstract Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are inc…
Generalized Molecular Descriptors Derived From Event-Based Discrete Derivative.
In the present study, a generalized approach for molecular structure characterization is introduced, based on the relation frequency matrix (F) representation of the molecular graph and the subsequent calculation of the corresponding discrete derivative (finite difference) over a pair of elements (atoms). In earlier publications (22- 24), an unique event, named connected subgraphs, (based on the Kier-Hall's subgraphs) was systematically employed for the computation of the matrix F. The present report is a generalization of this notion, in which eleven additional events are introduced, classified in three categories, namely, topological (terminal paths, vertex path incidence, quantum subgrap…
A Simple Method to Predict Blood-Brain Barrier Permeability of Drug- Like Compounds Using Classification Trees
Background: To know the ability of a compound to penetrate the blood-brain barrier (BBB) is a challenging task; despite the numerous efforts realized to predict/measure BBB passage, they still have several drawbacks. Methods: The prediction of the permeability through the BBB is carried out using classification trees. A large data set of 497 compounds (recently published) is selected to develop the tree model. Results: The best model shows an accuracy higher than 87.6% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 86.1% and 87.9%, correspondingly. We give a brief explanation, in structural terms, o…
QSPR/QSAR Studies of 2-Furylethylenes Using Bond-Level Quadratic Indices and Comparison with Other Computational Approaches
The recently introduced, non-stochastic and stochastic quadratic indices (Marrero-Ponce <em>et al. J. Comp. Aided Mol. Des.</em> 2006, 20, 685-701) were applied to QSAR/QSPR studies of heteroatomic molecules. These novel bond-based molecular descriptors (MDs) were used for the prediction of the partition coefficient (log P), and the antibacterial activity of 34 derivatives of 2-furylethylenes. Two statistically significant QSPR models using non-stochastic and stochastic bond-based quadratic indices were obtained (R<sup>2</sup> = 0.971, s = 0.137 and R<sup>2</sup> = 0.986, s = 0.096). These models showed good stability to data variation in leave-one-out (L…
Elucidating the aryl hydrocarbon receptor antagonism from a chemical-structural perspective.
The aryl hydrocarbon receptor (AhR) plays an important role in several biological processes such as reproduction, immunity and homoeostasis. However, little is known on the chemical-structural and physicochemical features that influence the activity of AhR antagonistic modulators. In the present report, in vitro AhR antagonistic activity evaluations, based on a chemical-activated luciferase gene expression (AhR-CALUX) bioassay, and an extensive literature review were performed with the aim of constructing a structurally diverse database of contaminants and potentially toxic chemicals. Subsequently, QSAR models based on Linear Discriminant Analysis and Logistic Regression, as well as two tox…
Targeting the aryl hydrocarbon receptor with a novel set of triarylmethanes
International audience; The aryl hydrocarbon receptor (AhR) is a chemical sensor upregulating the transcription of responsive genes associated with endocrine homeostasis, oxidative balance and diverse metabolic, immunological and inflammatory processes, which have raised the pharmacological interest on its modulation. Herein, a novel set of 32 unsymmetrical triarylmethane (TAM) class of structures has been synthesized, characterized and their AhR transcriptional activity evaluated using a cell-based assay. Eight of the assayed TAM compounds (14, 15, 18, 19, 21, 22, 25, 28) exhibited AhR agonism but none of them showed antagonist effects. TAMs bearing benzotrifluoride, naphthol or heteroarom…
State of the Art Review and Report of New Tool for Drug Discovery
BACKGROUND There are a great number of tools that can be used in QSAR/QSPR studies; they are implemented in several programs that are reviewed in this report. The usefulness of new tools can be proved through comparison, with previously published approaches. In order to perform the comparison, the most usual is the use of several benchmark datasets such as DRAGON and Sutherland's datasets. METHODS Here, an exploratory study of Atomic Weighted Vectors (AWVs), a new tool useful for drug discovery using different datasets, is presented. In order to evaluate the performance of the new tool, several statistics and QSAR/QSPR experiments are performed. Variability analyses are used to quantify the…
In silicoAntibacterial Activity Modeling Based on the TOMOCOMD-CARDD Approach
In the recent times, the race to cope with the increasing multidrug resistance of pathogenic bacteria has lost much of its momentum and health professionals are grasping for solutions to deal with the unprecedented resistance levels. As a result, there is an urgent need for a concerted effort towards the development of new antimicrobial drugs to stay ahead in the fight against the ever adapting bacteria. In the present report, antibacterial classification functions (models) based on the topological molecular computational design-computer aided >rational> drug design (TOMOCOMD-CARDD) atom-based non-stochastic and stochastic bilinear indices are presented. These models were built using the li…
QuBiLs-MAS method in early drug discovery and rational drug identification of antifungal agents
The QuBiLs-MAS approach is used for the in silico modelling of the antifungal activity of organic molecules. To this effect, non-stochastic (NS) and simple-stochastic (SS) atom-based quadratic indices are used to codify chemical information for a comprehensive dataset of 2478 compounds having a great structural variability, with 1087 of them being antifungal agents, covering the broadest antifungal mechanisms of action known so far. The NS and SS index-based antifungal activity classification models obtained using linear discriminant analysis (LDA) yield correct classification percentages of 90.73% and 92.47%, respectively, for the training set. Additionally, these models are able to correc…
<strong>New tool useful for drug discovery validated through benchmark datasets</strong>
Atomic Weighted Vectors (AWVs) are vectors that contain the codified information of molecular structures, which can apply to a set of Aggregation Operators (AOs) to calculate total and local molecular descriptors (MDs). This article presents an exploratory study of a new tool useful for drug discovery using different datasets, such as DRAGON and Sutherland’s datasets, as well as their comparison with other well-known approaches. In order to evaluate the performance of the tool, several statistics and QSAR/QSPR experiments were performed. Variability analyses are used to quantify the information content of the AWVs obtained from the tool, by the way of an information theory-based algorithm. …
Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragme…
Machine learning-based models to predict modes of toxic action of phenols to Tetrahymena pyriformis.
The phenols are structurally heterogeneous pollutants and they present a variety of modes of toxic action (MOA), including polar narcotics, weak acid respiratory uncouplers, pro-electrophiles, and soft electrophiles. Because it is often difficult to determine correctly the mechanism of action of a compound, quantitative structure-activity relationship (QSAR) methods, which have proved their interest in toxicity prediction, can be used. In this work, several QSAR models for the prediction of MOA of 221 phenols to the ciliated protozoan Tetrahymena pyriformis, using Chemistry Development Kit descriptors, are reported. Four machine learning techniques (ML), k-nearest neighbours, support vector…
In silico Antibacterial Activity Modeling Based on the TOMOCOMD-CARDD Approach
In the recent times, the race to cope with the increasing multidrug resistance of pathogenic bacteria has lost much of its momentum and health professionals are grasping for solutions to deal with the unprecedented resistance levels. As a result, there is an urgent need for a concerted effort towards the development of new antimicrobial drugs to stay ahead in the fight against the ever adapting bacteria. In the present report, antibacterial classification functions (models) based on the topological molecular computational design-computer aided ‘‘rational’’ drug design (TOMOCOMD-CARDD) atom-based non-stochastic and stochastic bilinear indices are presented. These models were built using the …