Search results for "Cross-validation"
showing 10 items of 50 documents
A computationally fast alternative to cross-validation in penalized Gaussian graphical models
2015
We study the problem of selection of regularization parameter in penalized Gaussian graphical models. When the goal is to obtain the model with good predicting power, cross validation is the gold standard. We present a new estimator of Kullback-Leibler loss in Gaussian Graphical model which provides a computationally fast alternative to cross-validation. The estimator is obtained by approximating leave-one-out-cross validation. Our approach is demonstrated on simulated data sets for various types of graphs. The proposed formula exhibits superior performance, especially in the typical small sample size scenario, compared to other available alternatives to cross validation, such as Akaike's i…
Confidence bands for Horvitz-Thompson estimators using sampled noisy functional data
2013
When collections of functional data are too large to be exhaustively observed, survey sampling techniques provide an effective way to estimate global quantities such as the population mean function. Assuming functional data are collected from a finite population according to a probabilistic sampling scheme, with the measurements being discrete in time and noisy, we propose to first smooth the sampled trajectories with local polynomials and then estimate the mean function with a Horvitz-Thompson estimator. Under mild conditions on the population size, observation times, regularity of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central Limit theorem in the space of …
Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter
2018
A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the $$\ell _1$$ or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this…
Modelling residuals dependence in dynamic life tables: A geostatistical approach
2008
The problem of modelling dynamic mortality tables is considered. In this context, the influence of age on data graduation needs to be properly assessed through a dynamic model, as mortality progresses over the years. After detrending the raw data, the residuals dependence structure is analysed, by considering them as a realisation of a homogeneous Gaussian random field defined on R × R. This setting allows for the implementation of geostatistical techniques for the estimation of the dependence and further interpolation in the domain of interest. In particular, a complex form of interaction between age and time is considered, by taking into account a zonally anisotropic component embedded in…
Análisis de métodos de validación cruzada para la obtención robusta de parámetros biofísicos
2015
[EN] Non-parametric regression methods are powerful statistical methods to retrieve biophysical parameters from remote sensing measurements. However, their performance can be affected by what has been presented during the training phase. To ensure robust retrievals, various cross-validation sub-sampling methods are often used, which allow to evaluate the model with subsets of the field dataset. Here, two types of cross-validation techniques were analyzed in the development of non-parametric regression models: hold-out and k-fold. Selected non-parametric linear regression methods were least squares Linear Regression (LR) and Partial Least Squares Regression (PLSR), and nonlinear methods were…
Prediction of organic carbon and total nitrogen contents in organic wastes and their composts by Infrared spectroscopy and partial least square regre…
2017
Middle and near infrared (MIR and NIR) were employed to determine organic carbon (OC) and total nitrogen (TN) in different soil organic amendments including wastes, composts and mixtures of composts and organic wastes. Prediction models based on partial least squares (PLS) regression from the spectra of untreated samples were built. Different spectra preprocessing strategies were adopted and the best number of latent variable was evaluated using leave-one-out cross-validation. Attenuated total reflectance (PLS-ATR-MIR) and diffuse reflectance (PLS-DR-NIR) models were built and evaluated from root mean square error of cross validation and prediction (RMSECV and RMSEP), coefficients of determ…
Normal and Abnormal Tissue Classification in Positron Emission Tomography Oncological Studies
2018
Positron Emission Tomography (PET) imaging is increasingly used in radiotherapy environment as well as for staging and assessing treatment response. The ability to classify PET tissues, as normal versus abnormal tissues, is crucial for medical analysis and interpretation. For this reason, a system for classifying PET area is implemented and validated. The proposed classification is carried out using k-nearest neighbor (KNN) method with the stratified K-Fold Cross-Validation strategy to enhance the classifier reliability. A dataset of eighty oncological patients are collected for system training and validation. For every patient, lesion (abnormal tissue) and background (normal tissue around …
Validation procedures in radiological diagnostic models. Neural network and logistic regression
1999
The objective of this paper is to compare the performance of two predictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validatio…
Cross validation of hard-copy and web-based formats of the Sport Imagery Ability Measure
2018
The purpose of this multi-sample study was to examine the psychometric characteristics, factor structure, and measurement invariance of the hard-copy and web-based versions of a measure of sport imagery ability, termed Sport Imagery Ability Measure (SIAM). In the first sample, Spanish athletes (N = 274, 161 men, 113 women, Mage = 21.91, SD = 6.67) completed a hard-copy version of the SIAM. A newly developed web-based version of the SIAM was cross validated in an independent group (N = 266, 147 men, 119 women, Mage = 25.93, SD = 9.84). A small group of participants (n = 16) completed both versions. Exploratory structural equation modelling and confirmatory factor analysis of the data from th…
Distinctive attributes for predicted secondary structures at terminal sequences of non-classically secreted proteins from proteobacteria
2008
Abstract C- and N-terminal sequences (64 amino acid residues each) of 89 non-classically secreted type I, type III and type IV proteins (Swiss-Prot/TrEMBL) from proteobacteria were transformed into predicted secondary structures. Multivariate analysis of variance (MANOVA) confirmed the significance of location (C- or N-termini) and secretion type as essential factors in respect of quantitative representations of structured (a-helices, b-strands) and unstructured (coils) elements. The profiles of secondary structures were transcripted using unequal property values for helices, strands and coils and corresponding numerical vectors (independent variables) were subjected to multiple discriminan…