Sparse model selection


On choosing the right model and regularisation parameter in sparse regression, which turn out to be very nearly the same, and closely coupled to doing the regression. There are some wrinkles.

What?

🏗 Explain my laborious reasoning that generalised Akaike information criteria don’t seem work when the penalty term is not continuous (e.g. \(L_1\) ), and the issues that therefore arise in model selection for such cases.

Present alternatives for choosing the optimal regularisation coefficient, especially outside cross-validation, especially computationally tractable ones. Methods based on statistical learning theory or concentration inequalities win gratitude.

Stability selection

🏗

Relaxed Lasso

Dantzig Selector

Garotte

🏗

Degrees-of-freedom penalties

See degrees of freedom.

Abramovich, Felix, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone. 2006. “Adapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics 34 (2): 584–653. https://doi.org/10.1214/009053606000000074.

Azadkia, Mona, and Sourav Chatterjee. 2019. “A Simple Measure of Conditional Dependence,” December. http://arxiv.org/abs/1910.12327.

Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont. 2008. “Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research 9 (Mar): 485–516. http://www.jmlr.org/papers/v9/banerjee08a.html.

Barbier, Jean. 2015. “Statistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory,” November. http://arxiv.org/abs/1511.01650.

Barron, Andrew R., Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. 2008. “Approximation and Learning by Greedy Algorithms.” The Annals of Statistics 36 (1): 64–94. https://doi.org/10.1214/009053607000000631.

Bayati, M., and A. Montanari. 2012. “The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory 58 (4): 1997–2017. https://doi.org/10.1109/TIT.2011.2174612.

Berk, Richard, Lawrence Brown, Andreas Buja, Kai Zhang, and Linda Zhao. 2013. “Valid Post-Selection Inference.” The Annals of Statistics 41 (2): 802–37. https://doi.org/10.1214/12-AOS1077.

Bertin, K., E. Le Pennec, and V. Rivoirard. 2011. “Adaptive Dantzig Density Estimation.” Annales de L’Institut Henri Poincaré, Probabilités et Statistiques 47 (1): 43–74. https://doi.org/10.1214/09-AIHP351.

Bondell, Howard D., Arun Krishna, and Sujit K. Ghosh. 2010. “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics 66 (4): 1069–77. https://doi.org/10.1111/j.1541-0420.2010.01391.x.

Breiman, Leo. 1995. “Better Subset Regression Using the Nonnegative Garrote.” Technometrics 37 (4): 373–84. http://www-personal.umich.edu/~jizhu/jizhu/wuke/Breiman-Technometrics95.pdf.

Bunea, Florentina, Alexandre B. Tsybakov, and Marten H. Wegkamp. 2007a. “Sparse Density Estimation with ℓ1 Penalties.” In Learning Theory, edited by Nader H. Bshouty and Claudio Gentile, 530–43. Lecture Notes in Computer Science. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_38.

Bunea, Florentina, Alexandre Tsybakov, and Marten Wegkamp. 2007b. “Sparsity Oracle Inequalities for the Lasso.” Electronic Journal of Statistics 1: 169–94. https://doi.org/10.1214/07-EJS008.

Bühlmann, Peter, and Sara van de Geer. 2015. “High-Dimensional Inference in Misspecified Linear Models” 9 (1): 1449–73. https://doi.org/10.1214/15-EJS1041.

Carmi, Avishy Y. 2014. “Compressive System Identification.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 281–324. Signals and Communication Technology. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-38398-4_9.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2016. “Double/Debiased Machine Learning for Treatment and Causal Parameters,” July. http://arxiv.org/abs/1608.00060.

Chernozhukov, Victor, Christian Hansen, Yuan Liao, and Yinchu Zhu. 2018. “Inference for Heterogeneous Effects Using Low-Rank Estimations,” December. http://arxiv.org/abs/1812.08089.

Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh. 2018. “Learning L2 Continuous Regression Functionals via Regularized Riesz Representers,” September. http://arxiv.org/abs/1809.05224.

Chetverikov, Denis, Zhipeng Liao, and Victor Chernozhukov. 2016. “On Cross-Validated Lasso,” May. http://arxiv.org/abs/1605.02214.

Chichignoud, Michaël, Johannes Lederer, and Martin Wainwright. 2014. “A Practical Scheme and Fast Algorithm to Tune the Lasso with Optimality Guarantees,” October. http://arxiv.org/abs/1410.0247.

Descloux, Pascaline, and Sylvain Sardy. 2018. “Model Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles,” May. http://arxiv.org/abs/1805.05133.

Dossal, Charles, Maher Kachour, Jalal M. Fadili, Gabriel Peyré, and Christophe Chesneau. 2011. “The Degrees of Freedom of the Lasso for General Design Matrix,” November. http://arxiv.org/abs/1111.1162.

El Karoui, Noureddine. 2008. “Operator Norm Consistent Estimation of Large Dimensional Sparse Covariance Matrices.” University of California, Berkeley 36 (6): 2717–56. http://digitalassets.lib.berkeley.edu/sdtr/ucb/text/734.pdf.

Ewald, Karl, and Ulrike Schneider. 2015. “Confidence Sets Based on the Lasso Estimator,” July. http://arxiv.org/abs/1507.05315.

Fan, Jianqing, and Runze Li. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60. https://doi.org/10.1198/016214501753382273.

Fan, Jianqing, and Jinchi Lv. 2010. “A Selective Overview of Variable Selection in High Dimensional Feature Space.” Statistica Sinica 20 (1): 101–48. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3092303/.

Flynn, Cheryl J., Clifford M. Hurvich, and Jeffrey S. Simonoff. 2013. “Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models,” February. http://arxiv.org/abs/1302.2068.

Geer, Sara van de. 2016. Estimation and Testing Under Sparsity. Vol. 2159. Lecture Notes in Mathematics. Cham: Springer International Publishing. http://link.springer.com/10.1007/978-3-319-32774-7.

Geer, Sara A. van de. 2008. “High-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics 36 (2): 614–45. https://doi.org/10.1214/009053607000000929.

Geer, Sara A. van de, Peter Bühlmann, and Shuheng Zhou. 2011. “The Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics 5: 688–749. https://doi.org/10.1214/11-EJS624.

Hall, Peter, Jiashun Jin, and Hugh Miller. 2014. “Feature Selection When There Are Many Influential Features.” Bernoulli 20 (3): 1647–71. https://doi.org/10.3150/13-BEJ536.

Hall, Peter, and Jing-Hao Xue. 2014. “On Selecting Interacting Features from High-Dimensional Data.” Computational Statistics & Data Analysis 71 (March): 694–708. https://doi.org/10.1016/j.csda.2012.10.010.

Hansen, Niels Richard, Patricia Reynaud-Bouret, and Vincent Rivoirard. 2015. “Lasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli 21 (1): 83–143. https://doi.org/10.3150/13-BEJ562.

Hastie, Trevor J., Rob Tibshirani, and Martin J. Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton: Chapman and Hall/CRC. https://web.stanford.edu/~hastie/StatLearnSparsity/index.html.

Hirose, Kei, Shohei Tateishi, and Sadanori Konishi. 2011. “Efficient Algorithm to Select Tuning Parameters in Sparse Regression Modeling with Regularization,” September. http://arxiv.org/abs/1109.2411.

Huang, Cong, G. L. H. Cheang, and Andrew R. Barron. 2008. “Risk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.” http://www.stat.yale.edu/~arb4/publications_files/RiskGreedySelectionAndL1penalization.pdf.

Janková, Jana, and Sara van de Geer. 2016. “Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity,” October. http://arxiv.org/abs/1610.01353.

Javanmard, Adel, and Andrea Montanari. 2014. “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research 15 (1): 2869–2909. http://jmlr.org/papers/v15/javanmard14a.html.

Kato, Kengo. 2009. “On the Degrees of Freedom in Shrinkage Estimation.” Journal of Multivariate Analysis 100 (7): 1338–52. https://doi.org/10.1016/j.jmva.2008.12.002.

Kim, Yongdai, Sunghoon Kwon, and Hosik Choi. 2012. “Consistent Model Selection Criteria on High Dimensions.” Journal of Machine Learning Research 13 (Apr): 1037–57. http://www.jmlr.org/papers/v13/kim12a.html.

Koltchinskii, Prof Vladimir. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics École d’Été de Probabilités de Saint-Flour 2033. Heidelberg: Springer. https://doi.org/10.1007/978-3-642-22147-7_1.

Lam, Clifford, and Jianqing Fan. 2009. “Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics 37 (6B): 4254–78. https://doi.org/10.1214/09-AOS720.

Lederer, Johannes, and Michael Vogt. 2020. “Estimating the Lasso’s Effective Noise,” April. http://arxiv.org/abs/2004.11554.

Lee, Jason D., Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor. 2013. “Exact Post-Selection Inference, with Application to the Lasso,” November. http://arxiv.org/abs/1311.6238.

Li, Wei, and Johannes Lederer. 2019. “Tuning Parameter Calibration for ℓ1-Regularized Logistic Regression.” Journal of Statistical Planning and Inference 202 (September): 80–98. https://doi.org/10.1016/j.jspi.2019.01.006.

Lim, Néhémy, and Johannes Lederer. 2016. “Efficient Feature Selection with Large and High-Dimensional Data,” September. http://arxiv.org/abs/1609.07195.

Lockhart, Richard, Jonathan Taylor, Ryan J. Tibshirani, and Robert Tibshirani. 2014. “A Significance Test for the Lasso.” The Annals of Statistics 42 (2): 413–68. https://doi.org/10.1214/13-AOS1175.

Meinshausen, Nicolai, and Peter Bühlmann. 2006. “High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics 34 (3): 1436–62. https://doi.org/10.1214/009053606000000281.

Meinshausen, Nicolai, and Bin Yu. 2009. “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics 37 (1): 246–70. https://doi.org/10.1214/07-AOS582.

Naik, Prasad A., and Chih-Ling Tsai. 2001. “Single‐index Model Selections.” Biometrika 88 (3): 821–32. https://doi.org/10.1093/biomet/88.3.821.

Nickl, Richard, and Sara van de Geer. 2013. “Confidence Sets in Sparse Regression.” The Annals of Statistics 41 (6): 2852–76. https://doi.org/10.1214/13-AOS1170.

Portnoy, Stephen, and Roger Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300. https://doi.org/10.1214/ss/1030037960.

Reynaud-Bouret, Patricia. 2003. “Adaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields 126 (1). https://doi.org/10.1007/s00440-003-0259-1.

Reynaud-Bouret, Patricia, and Sophie Schbath. 2010. “Adaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics 38 (5): 2781–2822. https://doi.org/10.1214/10-AOS806.

Shen, Xiaotong, and Hsin-Cheng Huang. 2006. “Optimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association 101 (474): 554–68. https://doi.org/10.1198/016214505000001078.

Shen, Xiaotong, Hsin-Cheng Huang, and Jimmy Ye. 2004. “Adaptive Model Selection and Assessment for Exponential Family Distributions.” Technometrics 46 (3): 306–17. https://doi.org/10.1198/004017004000000338.

Shen, Xiaotong, and Jianming Ye. 2002. “Adaptive Model Selection.” Journal of the American Statistical Association 97 (457): 210–21. https://doi.org/10.1198/016214502753479356.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267–88. http://statweb.stanford.edu/~tibs/lasso/lasso.pdf.

Tibshirani, Ryan J. 2014. “A General Framework for Fast Stagewise Algorithms,” August. http://arxiv.org/abs/1408.5801.

Wang, Hansheng, Guodong Li, and Guohua Jiang. 2007. “Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics 25 (3): 347–55. https://doi.org/10.1198/073500106000000251.

Yoshida, Ryo, and Mike West. 2010. “Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98. http://www.jmlr.org/papers/v11/yoshida10a.html.

Yuan, Ming, and Yi Lin. 2006. “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1): 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x.

———. 2007. “Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika 94 (1): 19–35. https://doi.org/10.1093/biomet/asm018.

Zhang, Cun-Hui. 2010. “Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics 38 (2): 894–942. https://doi.org/10.1214/09-AOS729.

Zhang, Cun-Hui, and Stephanie S. Zhang. 2014. “Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (1): 217–42. https://doi.org/10.1111/rssb.12026.

Zhang, Yiyun, Runze Li, and Chih-Ling Tsai. 2010. “Regularization Parameter Selections via Generalized Information Criterion.” Journal of the American Statistical Association 105 (489): 312–23. https://doi.org/10.1198/jasa.2009.tm08013.

Zhao, Peng, Guilherme Rocha, and Bin Yu. 2006. “Grouped and Hierarchical Model Selection Through Composite Absolute Penalties.” http://digitalassets.lib.berkeley.edu/sdtr/ucb/text/703.pdf.

———. 2009. “The Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics 37 (6A): 3468–97. https://doi.org/10.1214/07-AOS584.

Zhao, Peng, and Bin Yu. 2006. “On Model Selection Consistency of Lasso.” Journal of Machine Learning Research 7 (Nov): 2541–63. http://www.jmlr.org/papers/volume7/zhao06a/zhao06a.

Zou, Hui. 2006. “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101 (476): 1418–29. https://doi.org/10.1198/016214506000000735.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20. https://doi.org/10.1111/j.1467-9868.2005.00503.x.

Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2007. “On the ‘Degrees of Freedom’ of the Lasso.” The Annals of Statistics 35 (5): 2173–92. https://doi.org/10.1214/009053607000000127.