0000000000202134
AUTHOR
Alexandre Gramfort
Model identification and local linear convergence of coordinate descent
For composite nonsmooth optimization problems, Forward-Backward algorithm achieves model identification (e.g., support identification for the Lasso) after a finite number of iterations, provided the objective function is regular enough. Results concerning coordinate descent are scarcer and model identification has only been shown for specific estimators, the support-vector machine for instance. In this work, we show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions. In addition, we prove explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that thes…
Exploiting regularity in sparse Generalized Linear Models
International audience; Generalized Linear Models (GLM) are a wide class ofregression and classification models, where the predictedvariable is obtained from a linear combination of the in-put variables. For statistical inference in high dimensions,sparsity inducing regularization have proven useful whileoffering statistical guarantees. However, solving the result-ing optimization problems can be challenging: even forpopular iterative algorithms such as coordinate descent, oneneeds to loop over a large number of variables. To mitigatethis, techniques known asscreening rulesandworking setsdiminish the size of the optimization problem at hand, eitherby progressively removing variables, or by …
Dual Extrapolation for Sparse Generalized Linear Models
International audience; Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, o…
Implicit differentiation of Lasso-type models for hyperparameter optimization
International audience; Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparam-eter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter , which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient w.r.t. the hyperpa-rameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually s…
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning
International audience; Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible to leverage the non-smoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approxim…