Search results for "cross-validation"
showing 10 items of 50 documents
Assessment of the statistical significance of classifications in infrared spectroscopy based diagnostic models.
2014
Fourier transform infrared (IR) spectroscopy in combination with multivariate data analysis is a versatile tool that can be applied to disease diagnosis. However, a rigorous validation of the obtained models is necessary in order to obtain robust results. This work evaluates the advantages of the use of permutation testing for determining the statistical significance of the misclassification errors obtained from IR based diagnostic models through cross validation (CV). The model performance, estimated by CV, is compared to a distribution of CV-performance values obtained using randomly permuted class labels. The distribution of ‘random CV-values’ is considered as a null distribution and use…
Multivariate regression analysis applied to the calibration of equipment used in pig meat classification in Romania.
2016
This paper highlights the statistical methodology used in a dissection experiment carried out in Romania to calibrate and standardize two classification devices, OptiGrade PRO (OGP) and Fat-o-Meat'er (FOM). One hundred forty-five carcasses were measured using the two probes and dissected according to the European reference method. To derive prediction formulas for each device, multiple linear regression analysis was performed on the relationship between the reference lean meat percentage and the back fat and muscle thicknesses, using the ordinary least squares technique. The root mean squared error of prediction calculated using the leave-one-out cross validation met European Commission (EC…
A topological sub-structural approach for predicting human intestinal absorption of drugs.
2004
The human intestinal absorption (HIA) of drugs was studied using a topological sub-structural approach (TOPS-MODE). The drugs were divided into three classes according to reported cutoff values for HIA. "Poor" absorption was defined as HIAor =30%, "high" absorption as HIAor =80%, whereas "moderate" absorption was defined between these two values (30%HIA79%). Two linear discriminant analyses were carried out on a training set of 82 compounds. The percentages of correct classification, for both models, were 89.02%. The predictive power of the models were validated by three test: a leave-one-out cross validation procedure (88.9% and 87.9%), an external prediction set of 127 drugs (92.9% and 80…
Predicting ACL Injury Using Machine Learning on Data From an Extensive Screening Test Battery of 880 Female Elite Athletes
2022
Background: Injury risk prediction is an emerging field in which more research is needed to recognize the best practices for accurate injury risk assessment. Important issues related to predictive machine learning need to be considered, for example, to avoid overinterpreting the observed prediction performance. Purpose: To carefully investigate the predictive potential of multiple predictive machine learning methods on a large set of risk factor data for anterior cruciate ligament (ACL) injury; the proposed approach takes into account the effect of chance and random variations in prediction performance. Study Design: Case-control study; Level of evidence, 3. Methods: The authors used 3-dime…
Feature selection on a dataset of protein families: from exploratory data analysis to statistical variable importance
2016
Proteins are characterized by several typologies of features (structural, geometrical, energy). Most of these features are expected to be similar within a protein family. We are interested to detect which features can identify proteins that belong to a family, as well as to define the boundaries among families. Some features are redundant: they could generate noise in identifying which variables are essential as a fingerprint and, consequently, if they are related or not to a function of a protein family. We defined an original approach to analyze protein features for defining their relationships and peculiarities within protein families. A multistep approach has been mainly performed in R …
Atom-based 3D-chiral quadratic indices. Part 2: prediction of the corticosteroid-binding globulinbinding affinity of the 31 benchmark steroids data s…
2005
A quantitative structure-activity relationship (QSAR) study to predict the relative affinities of the steroid 'benchmark' data set to the corticosteroid-binding globulin (CBG) is described. It is shown that the 3D-chiral quadratic indices closely correlate with the measured CBG affinity values for the 31 steroids. The calculated descriptors were correlated with biological data through multiple linear regressions. Two statistically significant models were obtained when non-stochastic (R = 0.924 and s = 0.46) as well as stochastic (R = 0.929 and s = 0.46) 3D-chiral quadratic indices were used. A leave-one-out (LOO) approach to model validation is used here; the best results obtained in the cr…
Predicting antitrichomonal activity: A computational screening using atom-based bilinear indices and experimental proofs
2006
Existing Trichomonas vaginalis therapies are out of reach for most trichomoniasis people in developing countries and, where available, they are limited by their toxicity (mainly in pregnant women) and their cost. New antitrichomonal agents are needed to combat emerging metronidazole-resistant trichomoniasis and reduce the side effects associated with currently available drugs. Toward this end, atom-based bilinear indices, a new TOMOCOMD-CARDD molecular descriptor, and linear discriminant analysis (LDA) were used to discover novel, potent, and non-toxic lead trichomonacidal chemicals. Two discriminant functions were obtained with the use of non-stochastic and stochastic atom-type bilinear in…
<strong>Predicting Proteasome Inhibition using Atomic Weighted Vector and Machine Learning</strong>
2018
Ubiquitin/Proteasome System (UPS) is a highly regulated mechanism of intracellular protein degradation and turnover. Through the concerted actions of a series of enzymes, proteins are marked for proteasomal degradation by being linked to the polypeptide co-factor, ubiquitin. The UPS participates in a wide array of biological functions such as antigen presentation, regulation of gene transcription and the cell cycle, and activation of NF-κB. Some researchers have applied QSAR method and machine learning in the study of proteasome inhibition (EC50(µmol/L)), such as: the analysis of proteasome inhibition prediction, in the prediction of multi-target inhibitors of UPP and in the prediction of p…
A General Frame for Building Optimal Multiple SVM Kernels
2012
The aim of this paper is to define a general frame for building optimal multiple SVM kernels. Our scheme follows 5 steps: formal representation of the multiple kernels, structural representation, choice of genetic algorithm, SVM algorithm, and model evaluation. The computation of the optimal parameter values of SVM kernels is performed using an evolutionary method based on the SVM algorithm for evaluation of the quality of chromosomes. After the multiple kernel is found by the genetic algorithm we apply cross validation method for estimating the performance of our predictive model. We implemented and compared many hybrid methods derived from this scheme. Improved co-mutation operators are u…
Determination of total phenolic compounds in compost by infrared spectroscopy
2016
Abstract Middle and near infrared (MIR and NIR) were applied to determine the total phenolic compounds (TPC) content in compost samples based on models built by using partial least squares (PLS) regression. The multiplicative scatter correction, standard normal variate and first derivative were employed as spectra pretreatment, and the number of latent variable were optimized by leave-one-out cross-validation. The performance of PLS-ATR-MIR and PLS-DR-NIR models was evaluated according to root mean square error of cross validation and prediction (RMSECV and RMSEP), the coefficient of determination for prediction ( R pred 2 ) and residual predictive deviation (RPD) being obtained for this la…