6533b834fe1ef96bd129d334

RESEARCH PRODUCT

On the sign recovery by LASSO, thresholded LASSO and thresholded Basis Pursuit Denoising

Patrick TardivelMalgorzata Bogdan

subject

Statistics::TheoryStatistics::Machine Learning[STAT.AP]Statistics [stat]/Applications [stat.AP][STAT.AP] Statistics [stat]/Applications [stat.AP]Basis PursuitIdentifiability conditionMultiple regressionStatistics::MethodologyLASSOActive set estimationSign estimationSparsityIrrepresentability condition

description

Basis Pursuit (BP), Basis Pursuit DeNoising (BPDN), and LASSO are popular methods for identifyingimportant predictors in the high-dimensional linear regression model Y = Xβ + ε. By definition, whenε = 0, BP uniquely recovers β when Xβ = Xb and β different than b implies L1 norm of β is smaller than the L1 norm of b (identifiability condition). Furthermore, LASSO can recover the sign of β only under a much stronger irrepresentability condition. Meanwhile, it is known that the model selection properties of LASSO can be improved by hard-thresholdingits estimates. This article supports these findings by proving that thresholded LASSO, thresholded BPDNand thresholded BP recover the sign of β in both the noisy and noiseless cases if and only if β is identifiableand large enough. In particular, if X has iid Gaussian entries and the number of predictors grows linearlywith the sample size, then these thresholded estimators can recover the sign of β when the signal sparsity isasymptotically below the Donoho-Tanner transition curve. This is in contrast to the regular LASSO, whichasymptotically, recovers the sign of β only when the signal sparsity tends to 0. Numerical experiments showthat the identifiability condition, unlike the irrepresentability condition, does not seem to be affected by thestructure of the correlations in the X matrix.

http://hdl.handle.net/20.500.12278/28418