0000000000194602
AUTHOR
Lionello Pogliani
Testing selected optimal descriptors with artificial neural networks
Eleven properties have been modeled with the objective of checking the importance for model purposes of mixed descriptors made of empirical parameters, molecular connectivity indices and random numbers. The mixed descriptors with random indices have a descriptive character which is satisfactorily confirmed by the leave-one-out method of statistical analysis. The introduction of a partition of the set of compounds into training and evaluation sets decreases drastically the probability to find a mixed descriptor with random indices with good model quality. Two properties, the magnetic susceptibility and the elutropic values, insist on having optimal descriptors with random indices. The overal…
Predictability and prediction of lowest observed adverse effect levels in a structurally heterogeneous set of chemicals
A database of chronic lowest observed adverse effect levels (LOAELs) for 234 compounds, previously compiled from different sources (Toxicology Letters79, 131-143 (1995)), was modelled using graph theoretical descriptors. This study reveals that data are not homogeneous. Only those data originating from the U.S. Environmental Protection Agency (EPA) reports could be well modelled by multilinear regression (MLR) and linear discriminant analysis (LDA). In contrast, data available from the specific procedures of the National Toxicology Program (NTP) database introduced noise and did not render good models either alone, or in combination with the EPA data.
Superposing significant interaction rules (SSIR) method: a simple procedure for rapid ranking of congeneric compounds
The Superposing Significant Interaction Rules (SSIR) method is revised and implemented. The method is a simple combinatorial procedure, which deals with in situ generated rules among a dichotomized congeneric molecular family, selecting the most probabilistically relevant ones. The mere counting of the number of relevant rules attached to new compounds generates a molecular ranking useful for database filtering, refinement and prediction. The algorithm only needs for a symbolic molecular representation and this allows for mining the database in a confidential manner. Third parties will not know the real compounds that are on the way to be worked out. The procedure is tested for a complete s…
Notes on the Barometric Formula
Checking the Efficacy of Two Basic Descriptors With a Set of Properties of Alkanes
Several experimental properties of alkanes are described by means of multilinear models at the cross-validation level. The models have been obtained considering two main sets of descriptors: mathematically-based and experimental ones. The best models are obtained normally involving one of the two sets. The main goal of this work is to show how the theoretical descriptors are able to perform a competitive role against the experimental ones. This constitutes an important topic in the quantitative structure-property relationships field because the use of mathematical and in silico descriptors is validated as a proper tool for model building. Activity distributions of the properties and indices…
A Probabilistic Analysis About the Concepts of Difficulty and Usefulness of a Molecular Ranking Classification
Discerning between the concepts of difficulty and usefulness of a molecular ranking classification is of significant importance in virtual design chemistry. Here, both concepts are viewed from the statistical and practical point of view according to the standard definitions of enrichment and statistical significance p-values. These parameters are useful not only to compare distinct rankings obtained for the same molecular database, but also in order to compare the ones established in distinct molecular sets from an objective point of view.
QSPR with descriptors based on averages of vertex invariants. An artificial neural network study
New type of indices, the mean molecular connectivity indices (MMCI), based on nine different concepts of mean are proposed to model, together with molecular connectivity indices (MCI), experimental parameters and random variables, eleven properties of organic solvents. Two model methodologies are used to test the different descriptors: the multilinear least-squares (MLS) methodology and the Artificial Neural Network (ANN) methodology. The top three quantitative structure–property relationships (QSPR) for each property are chosen with the MLS method. The indices of these three QSPRs were used to train the ANNs that selected the best training sets of indices to estimate the evaluation sets of…