Author: Ernst Wit

0000000001229005

AUTHOR

Ernst Wit

showing 16 related works from this author

An Extension of the DgLARS Method to High-Dimensional Relative Risk Regression Models

2020

In recent years, clinical studies, where patients are routinely screened for many genomic features, are becoming more common. The general aim of such studies is to find genomic signatures useful for treatment decisions and the development of new treatments. However, genomic data are typically noisy and high dimensional, not rarely outstripping the number of patients included in the study. For this reason, sparse estimators are usually used in the study of high-dimensional survival data. In this paper, we propose an extension of the differential geometric least angle regression method to high-dimensional relative risk regression models.

Clustering high-dimensional dataComputer sciencedgLARS Gene expression data High-dimensional data Relative risk regression models Sparsity · Survival analysisLeast-angle regressionRelative riskStatisticsEstimatorRegression analysisExtension (predicate logic)High dimensionalSettore SECS-S/01 - StatisticaSurvival analysis

researchProduct

Extending graphical models for applications: on covariates, missingness and normality

2021

The authors of the paper “Bayesian Graphical Models for Modern Biological Applications” have put forward an important framework for making graphical models more useful in applied settings. In this discussion paper, we give a number of suggestions for making this framework even more suitable for practical scenarios. Firstly, we show that an alternative and simplified definition of covariate might make the framework more manageable in high-dimensional settings. Secondly, we point out that the inclusion of missing variables is important for practical data analysis. Finally, we comment on the effect that the Gaussianity assumption has in identifying the underlying conditional independence graph…

Statistics and ProbabilityComputer sciencemedia_common.quotation_subjectMissing dataConditional graphical modelsCopula graphical modelsMissing dataCovariateEconometricsSparse inferenceGraphical modelStatistics Probability and UncertaintyNormalitymedia_common

researchProduct

Dynamic Gaussian Graphical Models for Modelling Genomic Networks

2014

After sequencing the entire DNA for various organisms, the challenge has become understanding the functional interrelatedness of the genome. Only by understanding the pathways for various complex diseases can we begin to make sense of any type of treatment. Unfortunately, decyphering the genomic network structure is an enormous task. Even with a small number of genes the number of possible networks is very large. This problem becomes even more difficult, when we consider dynamical networks. We consider the problem of estimating a sparse dynamic Gaussian graphical model with \(L_1\) penalized maximum likelihood of structured precision matrix. The structure can consist of specific time dynami…

Basis (linear algebra)Computational complexity theoryComputer scienceGaussianFatorial Gaussian graphical modelsPenalized graphical models; Fatorial Gaussian graphical modelsType (model theory)Constraint (information theory)Matrix (mathematics)symbols.namesakeConvex optimizationsymbolsGraphical modelPenalized graphical modelSettore SECS-S/01 - StatisticaAlgorithm

researchProduct

A differential-geometric approach to generalized linear models with grouped predictors

2016

We propose an extension of the differential-geometric least angle regression method to perform sparse group inference in a generalized linear model. An efficient algorithm is proposed to compute the solution curve. The proposed group differential-geometric least angle regression method has important properties that distinguish it from the group lasso. First, its solution curve is based on the invariance properties of a generalized linear model. Second, it adds groups of variables based on a group equiangularity condition, which is shown to be related to score statistics. An adaptive version, which includes weights based on the Kullback-Leibler divergence, improves its variable selection fea…

Statistics and ProbabilityGeneralized linear modelStatistics::TheoryMathematical optimizationProper linear modelGeneral MathematicsORACLE PROPERTIESGeneralized linear modelSPARSITYGeneralized linear array model01 natural sciencesGeneralized linear mixed modelCONSISTENCY010104 statistics & probabilityScore statistic.LEAST ANGLE REGRESSIONLinear regressionESTIMATORApplied mathematicsDifferential geometry0101 mathematicsDivergence (statistics)MathematicsVariance functionDifferential-geometric least angle regressionPATH ALGORITHMApplied MathematicsLeast-angle regressionScore statistic010102 general mathematicsAgricultural and Biological Sciences (miscellaneous)Group lassoGROUP SELECTIONStatistics Probability and UncertaintyGeneral Agricultural and Biological SciencesSettore SECS-S/01 - Statistica

researchProduct

Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models

2013

Summary Sparsity is an essential feature of many contemporary data problems. Remote sensing, various forms of automated screening and other high throughput measurement devices collect a large amount of information, typically about few independent statistical subjects or units. In certain cases it is reasonable to assume that the underlying process generating the data is itself sparse, in the sense that only a few of the measured variables are involved in the process. We propose an explicit method of monotonically decreasing sparsity for outcomes that can be modelled by an exponential family. In our approach we generalize the equiangular condition in a generalized linear model. Although the …

Statistics and ProbabilityGeneralized linear modelSparse modelMathematical optimizationGeneralized linear modelsVariable selectionPath following algorithmEquiangular polygonGeneralized linear modelLASSODANTZIG SELECTORsymbols.namesakeExponential familyLasso (statistics)Sparse modelsDifferential geometryInformation geometryCOORDINATE DESCENTFisher informationERRORMathematicsLeast-angle regressionLeast angle regressionGeneralized degrees of freedomsymbolsSHRINKAGEStatistics Probability and UncertaintySimple linear regressionInformation geometrySettore SECS-S/01 - StatisticaAlgorithmCovariance penalty theory

researchProduct

Sparse relative risk regression models

2020

Summary Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios…

Statistics and ProbabilityClustering high-dimensional dataComputer sciencedgLARSInferenceScale (descriptive set theory)BiostatisticsMachine learningcomputer.software_genreRisk Assessment01 natural sciencesRegularization (mathematics)Relative risk regression model010104 statistics & probability03 medical and health sciencesNeoplasmsCovariateHumansComputer Simulation0101 mathematicsOnline Only ArticlesSurvival analysis030304 developmental biology0303 health sciencesModels Statisticalbusiness.industryLeast-angle regressionRegression analysisGeneral MedicineSurvival AnalysisHigh-dimensional dataGene expression dataRegression AnalysisArtificial intelligenceStatistics Probability and UncertaintySettore SECS-S/01 - StatisticabusinessSparsitycomputerBiostatistics

researchProduct

Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter

2018

A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the $$\ell _1$$ or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this…

Statistics and ProbabilityGeneralized linear modelMathematical optimizationGeneralized linear modelsPredictor-corrector algorithmGeneralized linear model02 engineering and technologyPoisson distributionDANTZIG SELECTOR01 natural sciencesCross-validationHigh-dimensional inferenceTheoretical Computer Science010104 statistics & probabilitysymbols.namesakeExponential familyLEAST ANGLE REGRESSION0202 electrical engineering electronic engineering information engineeringApplied mathematicsStatistics::Methodology0101 mathematicsCROSS-VALIDATIONMathematicsLeast-angle regressionLinear model020206 networking & telecommunicationsProbability and statisticsVARIABLE SELECTIONEfficient estimatorPredictor-corrector algorithmComputational Theory and MathematicsDispersion paremeterLINEAR-MODELSsymbolsSHRINKAGEStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaStatistics and Computing

researchProduct

ℓ1-Penalized Methods in High-Dimensional Gaussian Markov Random Fields

2016

In the last 20 years, we have witnessed the dramatic development of new data acquisition technologies allowing to collect massive amount of data with relatively low cost. is new feature leads Donoho to define the twenty-first century as the century of data. A major characteristic of this modern data set is that the number of measured variables is larger than the sample size; the word high-dimensional data analysis is referred to the statistical methods developed to make inference with this new kind of data. This chapter is devoted to the study of some of the most recent ℓ1-penalized methods proposed in the literature to make sparse inference in a Gaussian Markov random field (GMRF) defined …

Markov kernelMarkov random fieldMarkov chainComputer scienceStructured Graphical lassoVariable-order Markov model010103 numerical & computational mathematicsMarkov Random FieldMarkov model01 natural sciencesGaussian random field010104 statistics & probabilityHigh-Dimensional InferenceMarkov renewal processTuning Parameter SelectionMarkov propertyJoint Graphical lassoStatistical physics0101 mathematicsSettore SECS-S/01 - StatisticaGraphical lasso

researchProduct

A computationally fast alternative to cross-validation in penalized Gaussian graphical models

2015

We study the problem of selection of regularization parameter in penalized Gaussian graphical models. When the goal is to obtain the model with good predicting power, cross validation is the gold standard. We present a new estimator of Kullback-Leibler loss in Gaussian Graphical model which provides a computationally fast alternative to cross-validation. The estimator is obtained by approximating leave-one-out-cross validation. Our approach is demonstrated on simulated data sets for various types of graphs. The proposed formula exhibits superior performance, especially in the typical small sample size scenario, compared to other available alternatives to cross validation, such as Akaike's i…

Statistics and ProbabilityFOS: Computer and information sciencesGaussianInformation CriteriaCross-validationMethodology (stat.ME)symbols.namesakeBayesian information criterionStatisticsPenalized estimationGeneralized approximate cross-validationGraphical modelSDG 7 - Affordable and Clean EnergyStatistics - MethodologyMathematics/dk/atira/pure/sustainabledevelopmentgoals/affordable_and_clean_energyKullback-Leibler loApplied MathematicsEstimatorCross-validationGaussian graphical modelSample size determinationModeling and SimulationsymbolsInformation criteriaStatistics Probability and UncertaintyAkaike information criterionSettore SECS-S/01 - StatisticaAlgorithm

researchProduct

Selecting the tuning parameter in penalized Gaussian graphical models

2019

Penalized inference of Gaussian graphical models is a way to assess the conditional independence structure in multivariate problems. In this setting, the conditional independence structure, corresponding to a graph, is related to the choice of the tuning parameter, which determines the model complexity or degrees of freedom. There has been little research on the degrees of freedom for penalized Gaussian graphical models. In this paper, we propose an estimator of the degrees of freedom in $$\ell _1$$ -penalized Gaussian graphical models. Specifically, we derive an estimator inspired by the generalized information criterion and propose to use this estimator as the bias term for two informatio…

Statistics and ProbabilityStatistics::TheoryKullback–Leibler divergenceKullback-Leibler divergenceComputer scienceGaussianInformation Criteria010103 numerical & computational mathematicsModel complexityModel selection01 natural sciencesTheoretical Computer Science010104 statistics & probabilitysymbols.namesakeStatistics::Machine LearningGeneralized information criterionEntropy (information theory)Statistics::MethodologyGraphical model0101 mathematicsPenalized Likelihood Kullback-Leibler Divergence Model Complexity Model Selection Generalized Information Criterion.Model selectionEstimatorStatistics::ComputationComputational Theory and MathematicsConditional independencesymbolsPenalized likelihoodStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAlgorithmStatistics and Computing

researchProduct

Inferring slowly-changing dynamic gene-regulatory networks

2015

Dynamic gene-regulatory networks are complex since the interaction patterns between their components mean that it is impossible to study parts of the network in separation. This holistic character of gene-regulatory networks poses a real challenge to any type of modelling. Graphical models are a class of models that connect the network with a conditional independence relationships between random variables. By interpreting these random variables as gene activities and the conditional independence relationships as functional non-relatedness, graphical models have been used to describe gene-regulatory networks. Whereas the literature has been focused on static networks, most time-course experi…

Dynamic network analysisL1 penalized inferenceComputer scienceT-LymphocytesGene regulatory networkgene regulatory networkMachine learningcomputer.software_genreBiochemistrygene-regulatory networksStructural Biologygraphical modelscomputer simulationT lymphocyteHumansGene Regulatory NetworkshumanGraphical modelMolecular Biologylymphocyte activationClass (computer programming)Models Statisticalalgorithmbusiness.industryResearchApplied Mathematicsstatistical modelStatistical modelComplex networkQuantitative Biology::GenomicsComputer Science ApplicationsComputingMethodologies_PATTERNRECOGNITIONConditional independencemicroarray analysisComputingMethodologies_GENERALArtificial intelligencebusinessmetabolismRandom variablecomputerAlgorithmsBMC Bioinformatics

researchProduct

Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks.

2016

Abstract Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order – some entries of the precision matrix are a priori zeros – or equal dependency strengths across time lags – some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l 1-penalized maximum likelihood, imposing a further constraint on the absolute value…

0301 basic medicineStatistics and ProbabilityFactorialDependency (UML)Computer scienceGaussianNormal Distributionpenalized inferencesparse networkscomputer.software_genreMachine learning01 natural sciencesNormal distribution010104 statistics & probability03 medical and health sciencessymbols.namesakeSparse networksGeneticsComputer SimulationGene Regulatory NetworksGraphical model0101 mathematicsgene-regulatory systemMolecular BiologyProbabilityMarkov chainModels GeneticPenalized inferencebusiness.industryModel selectiongraphical modelGene-regulatory systemsComputational Mathematics030104 developmental biologysymbolsA priori and a posterioriData miningArtificial intelligenceGraphical modelsSettore SECS-S/01 - StatisticabusinesscomputerNeisseriaAlgorithmsStatistical applications in genetics and molecular biology

researchProduct

dglars: An R Package to Estimate Sparse Generalized Linear Models

2014

dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significan…

Statistics and ProbabilityGeneralized linear modelEXPRESSIONMathematical optimizationTISSUESFortrancyclic coordinate descent algorithmdgLARSFeature selectionDANTZIG SELECTORpredictor-corrector algorithmLIKELIHOODLEAST ANGLE REGRESSIONsparse modelsDifferential (infinitesimal)differential geometrylcsh:Statisticslcsh:HA1-4737computer.programming_languageMathematicsLeast-angle regressionExtension (predicate logic)Expression (computer science)generalized linear modelsBREAST-CANCER RISKVARIABLE SELECTIONDifferential geometrydifferential geometry generalized linear models dgLARS predictor-corrector algorithm cyclic coordinate descent algorithm sparse models variable selection.MARKERSHRINKAGEStatistics Probability and UncertaintyHAPLOTYPESSettore SECS-S/01 - StatisticacomputerAlgorithmSoftware

researchProduct

Factorial graphical models for dynamic networks

2015

AbstractDynamic network models describe many important scientific processes, from cell biology and epidemiology to sociology and finance. Estimating dynamic networks from noisy time series data is a difficult task since the number of components involved in the system is very large. As a result, the number of parameters to be estimated is typically larger than the number of observations. However, a characteristic of many real life networks is that they are sparse. For example, the molecular structure of genes make interactions with other components a highly-structured and, therefore, a sparse process. Until now, the literature has focused on static networks, which lack specific temporal inte…

Flexibility (engineering)Dynamic network analysisSociology and Political ScienceSocial PsychologyProcess (engineering)CommunicationConstrained optimizationcomputer.software_genreAutoregressive modelGraphical modelData miningTime seriescomputerBlock (data storage)Network Science

researchProduct

A Software Tool For Sparse Estimation Of A General Class Of High-dimensional GLMs

2022

Generalized linear models are the workhorse of many inferential problems. Also in the modern era with high-dimensional settings, such models have been proven to be effective exploratory tools. Most attention has been paid to Gaussian, binomial and Poisson settings, which have efficient computational implementations and where either the dispersion parameter is largely irrelevant or absent. However, general GLMs have dispersion parameters φ that affect the value of the log- likelihood. This in turn, affects the value of various information criteria such as AIC and BIC, and has a considerable impact on the computation and selection of the optimal model.The R-package dglars is one of the standa…

Statistics and ProbabilityNumerical Analysishigh-dimensional data dglars penalized inference computational statisticsStatistics Probability and UncertaintySettore SECS-S/01 - Statistica

researchProduct

Generalized information criterion for model selection in penalized graphical models

2014

This paper introduces an estimator of the relative directed distance between an estimated model and the true model, based on the Kulback-Leibler divergence and is motivated by the generalized information criterion proposed by Konishi and Kitagawa. This estimator can be used to select model in penalized Gaussian copula graphical models. The use of this estimator is not feasible for high-dimensional cases. However, we derive an efficient way to compute this estimator which is feasible for the latter class of problems. Moreover, this estimator is, generally, appropriate for several penalties such as lasso, adaptive lasso and smoothly clipped absolute deviation penalty. Simulations show that th…

Methodology (stat.ME)FOS: Computer and information sciencesStatistics::TheoryStatistics - Methodology

researchProduct