Sparse model selection

On choosing the right model and regularisation parameter in sparse regression, which turn out to be nearly the same, and closely coupled to doing the regression. There are some wrinkles.

πŸ— Talk about when degrees-of-freedom penalties work, when cross-validation and so on.


The new hotness sweeping the world is FOCI, a sparse model selection procedure (Azadkia and Chatterjee 2019) based on Chatterjee’s ΞΎ statistic as an independence test test. (Chatterjee 2020). Looks interesting.

Stability selection


For now see mplot for an introduction.

Relaxed Lasso


Dantzig Selector




Degrees-of-freedom penalties

See degrees of freedom.


Abramovich, Felix, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone. 2006. β€œAdapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics 34 (2): 584–653.
Azadkia, Mona, and Sourav Chatterjee. 2019. β€œA Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat], December.
Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont. 2008. β€œModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research 9 (Mar): 485–516.
Barbier, Jean. 2015. β€œStatistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory.” arXiv:1511.01650 [Cs, Math], November.
Barron, Andrew R., Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. 2008. β€œApproximation and Learning by Greedy Algorithms.” The Annals of Statistics 36 (1): 64–94.
Bayati, M., and A. Montanari. 2012. β€œThe LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory 58 (4): 1997–2017.
Berk, Richard, Lawrence Brown, Andreas Buja, Kai Zhang, and Linda Zhao. 2013. β€œValid Post-Selection Inference.” The Annals of Statistics 41 (2): 802–37.
Bertin, K., E. Le Pennec, and V. Rivoirard. 2011. β€œAdaptive Dantzig Density Estimation.” Annales de l’Institut Henri PoincarΓ©, ProbabilitΓ©s Et Statistiques 47 (1): 43–74.
Bertsimas, Dimitris, Angela King, and Rahul Mazumder. 2016. β€œBest Subset Selection via a Modern Optimization Lens.” The Annals of Statistics 44 (2): 813–52.
Bondell, Howard D., Arun Krishna, and Sujit K. Ghosh. 2010. β€œJoint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics 66 (4): 1069–77.
Breiman, Leo. 1995. β€œBetter Subset Regression Using the Nonnegative Garrote.” Technometrics 37 (4): 373–84.
BΓΌhlmann, Peter, and Sara van de Geer. 2015. β€œHigh-Dimensional Inference in Misspecified Linear Models.” arXiv:1503.06426 [Stat] 9 (1): 1449–73.
Bunea, Florentina, Alexandre B. Tsybakov, and Marten H. Wegkamp. 2007a. β€œSparse Density Estimation with β„“1 Penalties.” In Learning Theory, edited by Nader H. Bshouty and Claudio Gentile, 530–43. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Bunea, Florentina, Alexandre Tsybakov, and Marten Wegkamp. 2007b. β€œSparsity Oracle Inequalities for the Lasso.” Electronic Journal of Statistics 1: 169–94.
Carmi, Avishy Y. 2014. β€œCompressive System Identification.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 281–324. Signals and Communication Technology. Springer Berlin Heidelberg.
Chatterjee, Sourav. 2020. β€œA New Coefficient of Correlation.” arXiv:1909.10140 [Math, Stat], January.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2016. β€œDouble/Debiased Machine Learning for Treatment and Causal Parameters.” arXiv:1608.00060 [Econ, Stat], July.
Chernozhukov, Victor, Christian Hansen, Yuan Liao, and Yinchu Zhu. 2018. β€œInference For Heterogeneous Effects Using Low-Rank Estimations.” arXiv:1812.08089 [Math, Stat], December.
Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh. 2018. β€œLearning L2 Continuous Regression Functionals via Regularized Riesz Representers.” arXiv:1809.05224 [Econ, Math, Stat], September.
Chetverikov, Denis, Zhipeng Liao, and Victor Chernozhukov. 2016. β€œOn Cross-Validated Lasso.” arXiv:1605.02214 [Math, Stat], May.
Chichignoud, MichaΓ«l, Johannes Lederer, and Martin Wainwright. 2014. β€œA Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees.” arXiv:1410.0247 [Math, Stat], October.
Descloux, Pascaline, and Sylvain Sardy. 2018. β€œModel Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles.” arXiv:1805.05133 [Stat], May.
Dossal, Charles, Maher Kachour, Jalal M. Fadili, Gabriel PeyrΓ©, and Christophe Chesneau. 2011. β€œThe Degrees of Freedom of the Lasso for General Design Matrix.” arXiv:1111.1162 [Cs, Math, Stat], November.
El Karoui, Noureddine. 2008. β€œOperator Norm Consistent Estimation of Large Dimensional Sparse Covariance Matrices.” University of California, Berkeley 36 (6): 2717–56.
Ewald, Karl, and Ulrike Schneider. 2015. β€œConfidence Sets Based on the Lasso Estimator.” arXiv:1507.05315 [Math, Stat], July.
Fan, Jianqing, and Runze Li. 2001. β€œVariable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60.
Fan, Jianqing, and Jinchi Lv. 2010. β€œA Selective Overview of Variable Selection in High Dimensional Feature Space.” Statistica Sinica 20 (1): 101–48.
Flynn, Cheryl J., Clifford M. Hurvich, and Jeffrey S. Simonoff. 2013. β€œEfficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.” arXiv:1302.2068 [Stat], February.
Freijeiro-GonzΓ‘lez, Laura, Manuel Febrero-Bande, and Wenceslao GonzΓ‘lez-Manteiga. 2022. β€œA Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates.” International Statistical Review 90 (1): 118–45.
Geer, Sara A. van de. 2008. β€œHigh-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics 36 (2): 614–45.
Geer, Sara A. van de, Peter BΓΌhlmann, and Shuheng Zhou. 2011. β€œThe Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics 5: 688–749.
Geer, Sara van de. 2016. Estimation and Testing Under Sparsity. Vol. 2159. Lecture Notes in Mathematics. Cham: Springer International Publishing.
Hall, Peter, Jiashun Jin, and Hugh Miller. 2014. β€œFeature Selection When There Are Many Influential Features.” Bernoulli 20 (3): 1647–71.
Hall, Peter, and Jing-Hao Xue. 2014. β€œOn Selecting Interacting Features from High-Dimensional Data.” Computational Statistics & Data Analysis 71 (March): 694–708.
Hansen, Niels Richard, Patricia Reynaud-Bouret, and Vincent Rivoirard. 2015. β€œLasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli 21 (1): 83–143.
Hastie, Trevor J., Tibshirani, Rob, and Martin J. Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton: Chapman and Hall/CRC.
Hastie, Trevor, Robert Tibshirani, and Ryan J. Tibshirani. 2017. β€œExtended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso.” arXiv.
Hirose, Kei, Shohei Tateishi, and Sadanori Konishi. 2011. β€œEfficient Algorithm to Select Tuning Parameters in Sparse Regression Modeling with Regularization.” arXiv:1109.2411 [Stat], September.
Huang, Cong, G. L. H. Cheang, and Andrew R. Barron. 2008. β€œRisk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”
JankovΓ‘, Jana, and Sara van de Geer. 2016. β€œConfidence Regions for High-Dimensional Generalized Linear Models Under Sparsity.” arXiv:1610.01353 [Math, Stat], October.
Javanmard, Adel, and Andrea Montanari. 2014. β€œConfidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research 15 (1): 2869–909.
Kato, Kengo. 2009. β€œOn the Degrees of Freedom in Shrinkage Estimation.” Journal of Multivariate Analysis 100 (7): 1338–52.
Kim, Yongdai, Sunghoon Kwon, and Hosik Choi. 2012. β€œConsistent Model Selection Criteria on High Dimensions.” Journal of Machine Learning Research 13 (Apr): 1037–57.
Koltchinskii, Prof Vladimir. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics Γ‰cole d’ÉtΓ© de ProbabilitΓ©s de Saint-Flour 2033. Heidelberg: Springer.
Lam, Clifford, and Jianqing Fan. 2009. β€œSparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics 37 (6B): 4254–78.
Lederer, Johannes, and Michael Vogt. 2020. β€œEstimating the Lasso’s Effective Noise.” arXiv:2004.11554 [Stat], April.
Lee, Jason D., Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor. 2013. β€œExact Post-Selection Inference, with Application to the Lasso.” arXiv:1311.6238 [Math, Stat], November.
Lemhadri, Ismael, Feng Ruan, Louis Abraham, and Robert Tibshirani. 2021. β€œLassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research 22 (127): 1–29.
Li, Wei, and Johannes Lederer. 2019. β€œTuning Parameter Calibration for β„“1-Regularized Logistic Regression.” Journal of Statistical Planning and Inference 202 (September): 80–98.
Lim, NΓ©hΓ©my, and Johannes Lederer. 2016. β€œEfficient Feature Selection With Large and High-Dimensional Data.” arXiv:1609.07195 [Stat], September.
Lockhart, Richard, Jonathan Taylor, Ryan J. Tibshirani, and Robert Tibshirani. 2014. β€œA Significance Test for the Lasso.” The Annals of Statistics 42 (2): 413–68.
Lundberg, Scott M, and Su-In Lee. 2017. β€œA Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc.
Meinshausen, Nicolai, and Peter BΓΌhlmann. 2006. β€œHigh-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics 34 (3): 1436–62.
Meinshausen, Nicolai, and Bin Yu. 2009. β€œLasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics 37 (1): 246–70.
Naik, Prasad A., and Chih-Ling Tsai. 2001. β€œSingle‐index Model Selections.” Biometrika 88 (3): 821–32.
Nickl, Richard, and Sara van de Geer. 2013. β€œConfidence Sets in Sparse Regression.” The Annals of Statistics 41 (6): 2852–76.
Portnoy, Stephen, and Roger Koenker. 1997. β€œThe Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300.
Reynaud-Bouret, Patricia. 2003. β€œAdaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields 126 (1).
Reynaud-Bouret, Patricia, and Sophie Schbath. 2010. β€œAdaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics 38 (5): 2781–2822.
Semenova, Lesia, Cynthia Rudin, and Ronald Parr. 2021. β€œA Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning.” arXiv:1908.01755 [Cs, Stat], April.
Shen, Xiaotong, and Hsin-Cheng Huang. 2006. β€œOptimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association 101 (474): 554–68.
Shen, Xiaotong, Hsin-Cheng Huang, and Jimmy Ye. 2004. β€œAdaptive Model Selection and Assessment for Exponential Family Distributions.” Technometrics 46 (3): 306–17.
Shen, Xiaotong, and Jianming Ye. 2002. β€œAdaptive Model Selection.” Journal of the American Statistical Association 97 (457): 210–21.
Tarr, Garth, Samuel MΓΌller, and Alan H. Welsh. 2018. β€œMplot: An R Package for Graphical Model Stability and Variable Selection Procedures.” Journal of Statistical Software 83 (1): 1–28.
Tibshirani, Robert. 1996. β€œRegression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267–88.
Tibshirani, Ryan J. 2014. β€œA General Framework for Fast Stagewise Algorithms.” arXiv:1408.5801 [Stat], August.
Wang, Hansheng, Guodong Li, and Guohua Jiang. 2007. β€œRobust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics 25 (3): 347–55.
Xu, H., C. Caramanis, and S. Mannor. 2012. β€œSparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (1): 187–93.
Yoshida, Ryo, and Mike West. 2010. β€œBayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98.
Yuan, Ming, and Yi Lin. 2006. β€œModel Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1): 49–67.
β€”β€”β€”. 2007. β€œModel Selection and Estimation in the Gaussian Graphical Model.” Biometrika 94 (1): 19–35.
Zhang, Cun-Hui. 2010. β€œNearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics 38 (2): 894–942.
Zhang, Cun-Hui, and Stephanie S. Zhang. 2014. β€œConfidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (1): 217–42.
Zhang, Yiyun, Runze Li, and Chih-Ling Tsai. 2010. β€œRegularization Parameter Selections via Generalized Information Criterion.” Journal of the American Statistical Association 105 (489): 312–23.
Zhao, Peng, Guilherme Rocha, and Bin Yu. 2006. β€œGrouped and Hierarchical Model Selection Through Composite Absolute Penalties.”
β€”β€”β€”. 2009. β€œThe Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics 37 (6A): 3468–97.
Zhao, Peng, and Bin Yu. 2006. β€œOn Model Selection Consistency of Lasso.” Journal of Machine Learning Research 7 (Nov): 2541–63.
Zou, Hui. 2006. β€œThe Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101 (476): 1418–29.
Zou, Hui, and Trevor Hastie. 2005. β€œRegularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20.
Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2007. β€œOn the β€˜Degrees of Freedom’ of the Lasso.” The Annals of Statistics 35 (5): 2173–92.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.