Sparse model selection

September 5, 2016 — October 2, 2020

estimator distribution
functional analysis
high d
linear algebra
model selection
probability
signal processing
sparser than thou
statistics
Figure 1

On choosing the right model and regularisation parameter in sparse regression, which turn out to be nearly the same, and closely coupled to doing the regression. There are some wrinkles.

🏗 Talk about when degrees-of-freedom penalties work, when cross-validation and so on.

1 FOCI

The new hotness sweeping the world is FOCI, a sparse model selection procedure (Azadkia and Chatterjee 2019) based on Chatterjee’s ξ statistic as an independence test. (Chatterjee 2020). Looks interesting.

2 Stability selection

🏗

For now see mplot for an introduction.

3 Relaxed Lasso

🏗

4 Dantzig Selector

🏗

5 Garotte

🏗

6 Degrees-of-freedom penalties

See degrees of freedom.

7 References

Abramovich, Benjamini, Donoho, et al. 2006. Adapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics.
Azadkia, and Chatterjee. 2019. A Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat].
Banerjee, Ghaoui, and d’Aspremont. 2008. Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research.
Barbier. 2015. Statistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory.” arXiv:1511.01650 [Cs, Math].
Barron, Cohen, Dahmen, et al. 2008. Approximation and Learning by Greedy Algorithms.” The Annals of Statistics.
Bayati, and Montanari. 2012. The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory.
Berk, Brown, Buja, et al. 2013. Valid Post-Selection Inference.” The Annals of Statistics.
Bertin, Pennec, and Rivoirard. 2011. Adaptive Dantzig Density Estimation.” Annales de l’Institut Henri Poincaré, Probabilités Et Statistiques.
Bertsimas, King, and Mazumder. 2016. Best Subset Selection via a Modern Optimization Lens.” The Annals of Statistics.
Bondell, Krishna, and Ghosh. 2010. Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics.
Breiman. 1995. Better Subset Regression Using the Nonnegative Garrote.” Technometrics.
Bühlmann, and van de Geer. 2015. High-Dimensional Inference in Misspecified Linear Models.” arXiv:1503.06426 [Stat].
Bunea, Tsybakov, and Wegkamp. 2007a. Sparsity Oracle Inequalities for the Lasso.” Electronic Journal of Statistics.
Bunea, Tsybakov, and Wegkamp. 2007b. Sparse Density Estimation with ℓ1 Penalties.” In Learning Theory. Lecture Notes in Computer Science.
Carmi. 2014. Compressive System Identification.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Chatterjee. 2020. A New Coefficient of Correlation.” arXiv:1909.10140 [Math, Stat].
Chernozhukov, Chetverikov, Demirer, et al. 2018. Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal.
Chernozhukov, Hansen, Liao, et al. 2018. Inference For Heterogeneous Effects Using Low-Rank Estimations.” arXiv:1812.08089 [Math, Stat].
Chernozhukov, Newey, and Singh. 2018. Learning L2 Continuous Regression Functionals via Regularized Riesz Representers.” arXiv:1809.05224 [Econ, Math, Stat].
Chetverikov, Liao, and Chernozhukov. 2016. On Cross-Validated Lasso.” arXiv:1605.02214 [Math, Stat].
Chichignoud, Lederer, and Wainwright. 2014. A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees.” arXiv:1410.0247 [Math, Stat].
Descloux, and Sardy. 2018. Model Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles.” arXiv:1805.05133 [Stat].
Dossal, Kachour, Fadili, et al. 2011. The Degrees of Freedom of the Lasso for General Design Matrix.” arXiv:1111.1162 [Cs, Math, Stat].
El Karoui. 2008. Operator Norm Consistent Estimation of Large Dimensional Sparse Covariance Matrices.” University of California, Berkeley.
Ewald, and Schneider. 2015. Confidence Sets Based on the Lasso Estimator.” arXiv:1507.05315 [Math, Stat].
Fan, and Li. 2001. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association.
Fan, and Lv. 2010. A Selective Overview of Variable Selection in High Dimensional Feature Space.” Statistica Sinica.
Flynn, Hurvich, and Simonoff. 2013. Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.” arXiv:1302.2068 [Stat].
Freijeiro-González, Febrero-Bande, and González-Manteiga. 2022. A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates.” International Statistical Review.
Hall, Jin, and Miller. 2014. Feature Selection When There Are Many Influential Features.” Bernoulli.
Hall, and Xue. 2014. On Selecting Interacting Features from High-Dimensional Data.” Computational Statistics & Data Analysis.
Hansen, Reynaud-Bouret, and Rivoirard. 2015. Lasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli.
Hastie, Trevor J., Tibshirani, Rob, and Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations.
Hastie, Trevor, Tibshirani, and Tibshirani. 2017. Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso.”
Hirose, Tateishi, and Konishi. 2011. Efficient Algorithm to Select Tuning Parameters in Sparse Regression Modeling with Regularization.” arXiv:1109.2411 [Stat].
Huang, Cheang, and Barron. 2008. Risk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”
Janková, and van de Geer. 2016. Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity.” arXiv:1610.01353 [Math, Stat].
Javanmard, and Montanari. 2014. Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research.
Kato. 2009. On the Degrees of Freedom in Shrinkage Estimation.” Journal of Multivariate Analysis.
Kim, Kwon, and Choi. 2012. Consistent Model Selection Criteria on High Dimensions.” Journal of Machine Learning Research.
Koltchinskii. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics École d’Été de Probabilités de Saint-Flour 2033.
Lam, and Fan. 2009. Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics.
Lederer, and Vogt. 2020. Estimating the Lasso’s Effective Noise.” arXiv:2004.11554 [Stat].
Lee, Sun, Sun, et al. 2013. Exact Post-Selection Inference, with Application to the Lasso.” arXiv:1311.6238 [Math, Stat].
Lemhadri, Ruan, Abraham, et al. 2021. LassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research.
Li, and Lederer. 2019. Tuning Parameter Calibration for ℓ1-Regularized Logistic Regression.” Journal of Statistical Planning and Inference.
Lim, and Lederer. 2016. Efficient Feature Selection With Large and High-Dimensional Data.” arXiv:1609.07195 [Stat].
Lockhart, Taylor, Tibshirani, et al. 2014. A Significance Test for the Lasso.” The Annals of Statistics.
Lundberg, and Lee. 2017. A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems.
Meinshausen, and Bühlmann. 2006. High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics.
Meinshausen, and Yu. 2009. Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics.
Naik, and Tsai. 2001. Single‐index Model Selections.” Biometrika.
Nickl, and van de Geer. 2013. Confidence Sets in Sparse Regression.” The Annals of Statistics.
Portnoy, and Koenker. 1997. The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science.
Reynaud-Bouret. 2003. Adaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields.
Reynaud-Bouret, and Schbath. 2010. Adaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics.
Semenova, Rudin, and Parr. 2021. A Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning.” arXiv:1908.01755 [Cs, Stat].
Shen, and Huang. 2006. Optimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association.
Shen, Huang, and Ye. 2004. Adaptive Model Selection and Assessment for Exponential Family Distributions.” Technometrics.
Shen, and Ye. 2002. Adaptive Model Selection.” Journal of the American Statistical Association.
Tarr, Müller, and Welsh. 2018. Mplot: An R Package for Graphical Model Stability and Variable Selection Procedures.” Journal of Statistical Software.
Tibshirani, Robert. 1996. Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological).
Tibshirani, Ryan J. 2014. A General Framework for Fast Stagewise Algorithms.” arXiv:1408.5801 [Stat].
Geer, Sara A. van de. 2008. High-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics.
Geer, Sara van de. 2016. Estimation and Testing Under Sparsity. Lecture Notes in Mathematics.
Geer, Sara A. van de, Bühlmann, and Zhou. 2011. The Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics.
Wang, Li, and Jiang. 2007. Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics.
Wasserman, and Roeder. 2009. High-Dimensional Variable Selection.” Annals of Statistics.
Xu, Caramanis, and Mannor. 2012. Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Yoshida, and West. 2010. Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research.
Yuan, and Lin. 2006. Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
———. 2007. Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika.
Zhang, Cun-Hui. 2010. Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics.
Zhang, Yiyun, Li, and Tsai. 2010. Regularization Parameter Selections via Generalized Information Criterion.” Journal of the American Statistical Association.
Zhang, Cun-Hui, and Zhang. 2014. Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zhao, Rocha, and Yu. 2006. Grouped and Hierarchical Model Selection Through Composite Absolute Penalties.”
———. 2009. The Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics.
Zhao, and Yu. 2006. On Model Selection Consistency of Lasso.” Journal of Machine Learning Research.
Zou. 2006. The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association.
Zou, and Hastie. 2005. Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zou, Hastie, and Tibshirani. 2007. On the ‘Degrees of Freedom’ of the Lasso.” The Annals of Statistics.