6533b861fe1ef96bd12c45dc

RESEARCH PRODUCT

Implicit differentiation of Lasso-type models for hyperparameter optimization

Quentin BertrandQuentin KlopfensteinMathieu BlondelSamuel VaiterAlexandre GramfortJoseph Salmon

subject

FOS: Computer and information sciencesComputer Science - Machine Learning[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]Statistics - Machine LearningMachine Learning (stat.ML)[STAT.ML] Statistics [stat]/Machine Learning [stat.ML]Machine Learning (cs.LG)

description

International audience; Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparam-eter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter , which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient w.r.t. the hyperpa-rameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually suffers from high memory consumption. Alternatively implicit differentiation typically involves solving a linear system which can be prohibitive and numerically unstable in high dimension. In addition, implicit differentiation usually assumes smooth loss functions, which is not the case for Lasso-type problems. This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions. Experiments demonstrate that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Esti-mator (SURE).

https://hal.science/hal-02532683