Author: Ernst Wit

0000000000324309

AUTHOR

Ernst Wit

An Extension of the DgLARS Method to High-Dimensional Relative Risk Regression Models

In recent years, clinical studies, where patients are routinely screened for many genomic features, are becoming more common. The general aim of such studies is to find genomic signatures useful for treatment decisions and the development of new treatments. However, genomic data are typically noisy and high dimensional, not rarely outstripping the number of patients included in the study. For this reason, sparse estimators are usually used in the study of high-dimensional survival data. In this paper, we propose an extension of the differential geometric least angle regression method to high-dimensional relative risk regression models.

research product

Extending graphical models for applications: on covariates, missingness and normality

The authors of the paper “Bayesian Graphical Models for Modern Biological Applications” have put forward an important framework for making graphical models more useful in applied settings. In this discussion paper, we give a number of suggestions for making this framework even more suitable for practical scenarios. Firstly, we show that an alternative and simplified definition of covariate might make the framework more manageable in high-dimensional settings. Secondly, we point out that the inclusion of missing variables is important for practical data analysis. Finally, we comment on the effect that the Gaussianity assumption has in identifying the underlying conditional independence graph…

research product

A Software Tool For Sparse Estimation Of A General Class Of High-dimensional GLMs

Generalized linear models are the workhorse of many inferential problems. Also in the modern era with high-dimensional settings, such models have been proven to be effective exploratory tools. Most attention has been paid to Gaussian, binomial and Poisson settings, which have efficient computational implementations and where either the dispersion parameter is largely irrelevant or absent. However, general GLMs have dispersion parameters φ that affect the value of the log- likelihood. This in turn, affects the value of various information criteria such as AIC and BIC, and has a considerable impact on the computation and selection of the optimal model.The R-package dglars is one of the standa…

research product

Dynamic Gaussian Graphical Models for Modelling Genomic Networks

After sequencing the entire DNA for various organisms, the challenge has become understanding the functional interrelatedness of the genome. Only by understanding the pathways for various complex diseases can we begin to make sense of any type of treatment. Unfortunately, decyphering the genomic network structure is an enormous task. Even with a small number of genes the number of possible networks is very large. This problem becomes even more difficult, when we consider dynamical networks. We consider the problem of estimating a sparse dynamic Gaussian graphical model with \(L_1\) penalized maximum likelihood of structured precision matrix. The structure can consist of specific time dynami…

research product

A differential-geometric approach to generalized linear models with grouped predictors

We propose an extension of the differential-geometric least angle regression method to perform sparse group inference in a generalized linear model. An efficient algorithm is proposed to compute the solution curve. The proposed group differential-geometric least angle regression method has important properties that distinguish it from the group lasso. First, its solution curve is based on the invariance properties of a generalized linear model. Second, it adds groups of variables based on a group equiangularity condition, which is shown to be related to score statistics. An adaptive version, which includes weights based on the Kullback-Leibler divergence, improves its variable selection fea…

research product

Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models

Summary Sparsity is an essential feature of many contemporary data problems. Remote sensing, various forms of automated screening and other high throughput measurement devices collect a large amount of information, typically about few independent statistical subjects or units. In certain cases it is reasonable to assume that the underlying process generating the data is itself sparse, in the sense that only a few of the measured variables are involved in the process. We propose an explicit method of monotonically decreasing sparsity for outcomes that can be modelled by an exponential family. In our approach we generalize the equiangular condition in a generalized linear model. Although the …

research product

Sparse relative risk regression models

Summary Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios…

research product

Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter

A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the $$\ell _1$$ or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this…

research product

ℓ1-Penalized Methods in High-Dimensional Gaussian Markov Random Fields

In the last 20 years, we have witnessed the dramatic development of new data acquisition technologies allowing to collect massive amount of data with relatively low cost. is new feature leads Donoho to define the twenty-first century as the century of data. A major characteristic of this modern data set is that the number of measured variables is larger than the sample size; the word high-dimensional data analysis is referred to the statistical methods developed to make inference with this new kind of data. This chapter is devoted to the study of some of the most recent ℓ1-penalized methods proposed in the literature to make sparse inference in a Gaussian Markov random field (GMRF) defined …

research product

A computationally fast alternative to cross-validation in penalized Gaussian graphical models

We study the problem of selection of regularization parameter in penalized Gaussian graphical models. When the goal is to obtain the model with good predicting power, cross validation is the gold standard. We present a new estimator of Kullback-Leibler loss in Gaussian Graphical model which provides a computationally fast alternative to cross-validation. The estimator is obtained by approximating leave-one-out-cross validation. Our approach is demonstrated on simulated data sets for various types of graphs. The proposed formula exhibits superior performance, especially in the typical small sample size scenario, compared to other available alternatives to cross validation, such as Akaike's i…

research product

Selecting the tuning parameter in penalized Gaussian graphical models

Penalized inference of Gaussian graphical models is a way to assess the conditional independence structure in multivariate problems. In this setting, the conditional independence structure, corresponding to a graph, is related to the choice of the tuning parameter, which determines the model complexity or degrees of freedom. There has been little research on the degrees of freedom for penalized Gaussian graphical models. In this paper, we propose an estimator of the degrees of freedom in $$\ell _1$$ -penalized Gaussian graphical models. Specifically, we derive an estimator inspired by the generalized information criterion and propose to use this estimator as the bias term for two informatio…

research product

Inferring slowly-changing dynamic gene-regulatory networks

Dynamic gene-regulatory networks are complex since the interaction patterns between their components mean that it is impossible to study parts of the network in separation. This holistic character of gene-regulatory networks poses a real challenge to any type of modelling. Graphical models are a class of models that connect the network with a conditional independence relationships between random variables. By interpreting these random variables as gene activities and the conditional independence relationships as functional non-relatedness, graphical models have been used to describe gene-regulatory networks. Whereas the literature has been focused on static networks, most time-course experi…

research product

Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks.

Abstract Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order – some entries of the precision matrix are a priori zeros – or equal dependency strengths across time lags – some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l 1-penalized maximum likelihood, imposing a further constraint on the absolute value…

research product

dglars: An R Package to Estimate Sparse Generalized Linear Models

dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significan…

research product

Generalized information criterion for model selection in penalized graphical models

This paper introduces an estimator of the relative directed distance between an estimated model and the true model, based on the Kulback-Leibler divergence and is motivated by the generalized information criterion proposed by Konishi and Kitagawa. This estimator can be used to select model in penalized Gaussian copula graphical models. The use of this estimator is not feasible for high-dimensional cases. However, we derive an efficient way to compute this estimator which is feasible for the latter class of problems. Moreover, this estimator is, generally, appropriate for several penalties such as lasso, adaptive lasso and smoothly clipped absolute deviation penalty. Simulations show that th…

research product

Factorial graphical models for dynamic networks

AbstractDynamic network models describe many important scientific processes, from cell biology and epidemiology to sociology and finance. Estimating dynamic networks from noisy time series data is a difficult task since the number of components involved in the system is very large. As a result, the number of parameters to be estimated is typically larger than the number of observations. However, a characteristic of many real life networks is that they are sparse. For example, the molecular structure of genes make interactions with other components a highly-structured and, therefore, a sparse process. Until now, the literature has focused on static networks, which lack specific temporal inte…

research product