Search results for "variable selection"
showing 10 items of 24 documents
A new tuning parameter selector in lasso regression
2019
Penalized regression models are popularly used in high-dimensional data analysis to carry out variable selction and model fitting simultaneously. Whereas success has been widely reported in literature, their performance largely depend on the tuning parameter that balances the trade-off between model fitting and sparsity. In this work we introduce a new tuning parameter selction criterion based on the maximization of the signal-to-noise ratio. To prove its effectiveness we applied it to a real data on prostate cancer disease.
Criteria for Bayesian model choice with application to variable selection
2012
In objective Bayesian model selection, no single criterion has emerged as dominant in defining objective prior distributions. Indeed, many criteria have been separately proposed and utilized to propose differing prior choices. We first formalize the most general and compelling of the various criteria that have been suggested, together with a new criterion. We then illustrate the potential of these criteria in determining objective model selection priors by considering their application to the problem of variable selection in normal linear models. This results in a new model selection objective prior with a number of compelling properties.
Intelligent solutions for real-life data-driven applications
2017
The subject of this thesis belongs to the topic of machine learning or, specifically, to the development of advanced methods for regression analysis, clustering, and anomaly detection. Industry is constantly seeking improved production practices and minimized production time and costs. In connection to this, several industrial case studies are presented in which mathematical models for predicting paper quality were proposed. The most important variables for the prediction models are selected based on information-theoretic measures and regression trees approach. The rest of the original papers are devoted to unsupervised machine learning. The main focus is developing advanced spectral cluster…
Using differential LARS algorithm to study the expression profile of a sample of patients with latex-fruit syndrome
2010
Natural rubber latex IgE-mediated hypersensitivity is one of the most important health problems in allergy during recent years. The prevalence of individuals allergic to latex shows an associated hypersensitivity to some plant-derived foods, especially freshly consumed fruit. This association of latex allergy and allergy to plant-derived foods is called latex-fruit syndrome. The aim of this study is to use the differential geometric generalization of the LARS algorithm to identify candidate genes that may be associated with the pathogenesis of allergy to latex or vegetable food.
The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression.
2020
This paper focuses on hypothesis testing in lasso regression, when one is interested in judging statistical significance for the regression coefficients in the regression equation involving a lot of covariates. To get reliable p-values, we propose a new lasso-type estimator relying on the idea of induced smoothing which allows to obtain appropriate covariance matrix and Wald statistic relatively easily. Some simulation experiments reveal that our approach exhibits good performance when contrasted with the recent inferential tools in the lasso framework. Two real data analyses are presented to illustrate the proposed framework in practice.
Clusters of effects curves in quantile regression models
2018
In this paper, we propose a new method for finding similarity of effects based on quantile regression models. Clustering of effects curves (CEC) techniques are applied to quantile regression coefficients, which are one-to-one functions of the order of the quantile. We adopt the quantile regression coefficients modeling (QRCM) framework to describe the functional form of the coefficient functions by means of parametric models. The proposed method can be utilized to cluster the effect of covariates with a univariate response variable, or to cluster a multivariate outcome. We report simulation results, comparing our approach with the existing techniques. The idea of combining CEC with QRCM per…
Variable Selection with Quasi-Unbiased Estimation: the CDF Penalty
2022
We propose a new non-convex penalty in linear regression models. The new penalty function can be considered a competitor of the LASSO, SCAD or MCP penalties, as it guarantees sparse variable selection while reducing bias for the non-null estimates. We introduce the methodology and present some comparisons among different approaches.
Applying differential geometric LARS algorithm to ultra-high dimensional feature space
2009
Variable selection is fundamental in high-dimensional statistical modeling. Many techniques to select relevant variables in generalized linear models are based on a penalized likelihood approach. In a recent paper, Fan and Lv (2008) proposed a sure independent screening (SIS) method to select relevant variables in a linear regression model defined on a ultrahigh dimensional feature space. Aim of this paper is to define a generalization of the SIS method for generalized linear models based on a differential geometric approach.
dglars: An R Package to Estimate Sparse Generalized Linear Models
2014
dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significan…
Analyses spectrale et texturale de données haute résolution pour la détection automatique des maladies de la vigne
2019
‘Flavescence dorée’ is a contagious and incurable disease present on the vine leaves. The DAMAV project (Automatic detection of Vine Diseases) aims to develop a solution for automated detection of vine diseases using a micro-drone. The goal is to offer a turnkey solution for wine growers. This tool will allow the search for potential foci, and then more generally any type of detectable vine disease on the foliage. To enable this diagnosis, the foliage is proposed to be studied using a dedicated high-resolution multispectral camera.The objective of this PhD-thesis in the context of DAMAV is to participate in the design and implementation of a Multi-Spectral (MS) image acquisition system and …