# Sparse regression

June 23, 2016 β October 24, 2019

Penalised regression where the penalties are sparsifying. The prediction losses could be anything β likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have to choose clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and measurement error.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

π disambiguate the optimisation technologies at play β iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.

## 1 LASSO

Quadratic loss penalty, absolute coefficient penalty. We estimate the regression coefficients $$\beta$$ by solving

\begin{aligned} \hat{\beta} = \underset{\beta \in \mathbb{R}^p}{\text{argmin}} \: \frac{1}{2} \| y - {\bf X} \beta \|_2^2 + \lambda \| \beta \|_1, \end{aligned}

The penalty coefficient $$\lambda$$ is left for you to choose, but one of the magical properties of the lasso is that it is easy to test many possible values of $$\lambda$$ at low marginal cost.

Popular because, amongst other reasons, it turns out to be in practice fast and convenient, and amenable to various performance accelerations e.g. aggressive approximate variable selection.

π This is the one with famous oracle properties if you choose $$\lambda$$ correctly. Hsi Zouβs paper on this (Zou 2006) is readable. I am having trouble digesting Sara van de Geerβs paper on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, but with far more general assumptions on the model and loss functions, and some finite sample guarnatees.

## 3 LARS

A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.

## 4 Graph LASSO

As used in graphical models. π

## 5 Elastic net

Combination of $$L_1$$ and $$L_2$$ penalties. π

## 6 Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See .

## 7 Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

## 8 Debiased LASSO

There exist a few versions, but the one I have needed is , section 2.1. See also and . (π relation to ?)

## 9 Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

## 10 Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing.

## 11 Other coefficient penalties

Put a weird penalty on the coefficients! E.g. βSmoothly Clipped Absolute Deviationβ (SCAD). π

## 12 Other prediction losses

Put a weird penalty on the error! MAD prediction penalty, lasso-coefficient penalty, etc.

See for some implementations using e.g. maximum absolute prediction error.

## 14 Implementations

Hastie, Friedman etaβs glmnet for R is fast and well-regarded, and has a MATLAB version. Hereβs how to use it for adaptive lasso. Kenneth Tay has implemented elasticnet penalty for any GLM in glmnet.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. Itβs an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.

## 15 Tidbits

Sparse regression as a universal classifier explainer? Local Interpretable Model-agnostic Explanations uses LASSO for model interpretation this. (See the blog post, or the source.

## 16 References

Abramovich, Benjamini, Donoho, et al. 2006. The Annals of Statistics.
Aghasi, Nguyen, and Romberg. 2016. arXiv:1611.05162 [Cs, Stat].
Aragam, Amini, and Zhou. 2015. arXiv:1511.08963 [Cs, Math, Stat].
Azadkia, and Chatterjee. 2019. arXiv:1910.12327 [Cs, Math, Stat].
Azizyan, Krishnamurthy, and Singh. 2015. arXiv:1506.00898 [Cs, Math, Stat].
Bach. 2009. arXiv:0901.3202 [Cs, Stat].
Bach, Jenatton, and Mairal. 2011. Optimization With Sparsity-Inducing Penalties. Foundations and Trends(r) in Machine Learning 1.0.
Bahmani, and Romberg. 2014. arXiv:1501.00046 [Cs, Math, Stat].
Banerjee, Arindam, Chen, Fazayeli, et al. 2014. In Advances in Neural Information Processing Systems 27.
Banerjee, Onureena, Ghaoui, and dβAspremont. 2008. Journal of Machine Learning Research.
Barber, and CandΓ¨s. 2015. The Annals of Statistics.
Barbier. 2015. arXiv:1511.01650 [Cs, Math].
Baron, Sarvotham, and Baraniuk. 2010. IEEE Transactions on Signal Processing.
Barron, Cohen, Dahmen, et al. 2008. The Annals of Statistics.
Barron, Huang, Li, et al. 2008. In Information Theory Workshop, 2008. ITWβ08. IEEE.
Battiti. 1992. Neural Computation.
Bayati, and Montanari. 2012. IEEE Transactions on Information Theory.
Bellec, and Tsybakov. 2016. arXiv:1609.06675 [Math, Stat].
Belloni, Chernozhukov, and Wang. 2011. Biometrika.
Berk, Brown, Buja, et al. 2013. The Annals of Statistics.
Bertin, Pennec, and Rivoirard. 2011. Annales de lβInstitut Henri PoincarΓ©, ProbabilitΓ©s Et Statistiques.
Bian, Chen, and Ye. 2014. Mathematical Programming.
Bien, Gaynanova, Lederer, et al. 2018. Journal of Computational and Graphical Statistics.
Bloniarz, Liu, Zhang, et al. 2015. arXiv:1507.03652 [Math, Stat].
Bondell, Krishna, and Ghosh. 2010. Biometrics.
Borgs, Chayes, Cohn, et al. 2014. arXiv:1401.2906 [Math].
Bottou, Curtis, and Nocedal. 2016. arXiv:1606.04838 [Cs, Math, Stat].
Breiman. 1995. Technometrics.
Bruckstein, Elad, and Zibulevsky. 2008. IEEE Transactions on Information Theory.
Brunton, Proctor, and Kutz. 2016. Proceedings of the National Academy of Sciences.
BΓΌhlmann, and van de Geer. 2011. In Statistics for High-Dimensional Data. Springer Series in Statistics.
βββ. 2015. arXiv:1503.06426 [Stat].
Bu, and Lederer. 2017. arXiv:1704.02739 [Stat].
Bunea, Tsybakov, and Wegkamp. 2007a. Electronic Journal of Statistics.
Bunea, Tsybakov, and Wegkamp. 2007b. In Learning Theory. Lecture Notes in Computer Science.
CandΓ¨s, and Davenport. 2011. arXiv:1104.5246 [Cs, Math, Stat].
CandΓ¨s, Fan, Janson, et al. 2016. arXiv Preprint arXiv:1610.02351.
CandΓ¨s, and Fernandez-Granda. 2013. Journal of Fourier Analysis and Applications.
CandΓ¨s, and Plan. 2010. βMatrix Completion With Noise.β Proceedings of the IEEE.
CandΓ¨s, Romberg, and Tao. 2006. Communications on Pure and Applied Mathematics.
CandΓ¨s, Wakin, and Boyd. 2008. Journal of Fourier Analysis and Applications.
Carmi. 2013. Digital Signal Processing.
βββ. 2014. In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Cevher, Duarte, Hegde, et al. 2009. In Advances in Neural Information Processing Systems.
Chartrand, and Yin. 2008. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008.
Chatterjee. 2020. arXiv:1909.10140 [Math, Stat].
Chen, Xiaojun. 2012. Mathematical Programming.
Chen, Y., and Hero. 2012. IEEE Transactions on Signal Processing.
Chen, Minhua, Silva, Paisley, et al. 2010. IEEE Transactions on Signal Processing.
Chen, Yen-Chi, and Wang. n.d.
Chernozhukov, Chetverikov, Demirer, et al. 2016. arXiv:1608.00060 [Econ, Stat].
Chernozhukov, Hansen, Liao, et al. 2018. arXiv:1812.08089 [Math, Stat].
Chernozhukov, Newey, and Singh. 2018. arXiv:1809.05224 [Econ, Math, Stat].
Chetverikov, Liao, and Chernozhukov. 2016. βOn Cross-Validated Lasso.β arXiv:1605.02214 [Math, Stat].
Chichignoud, Lederer, and Wainwright. 2014. arXiv:1410.0247 [Math, Stat].
Dai, and Barber. 2016. arXiv Preprint arXiv:1602.03589.
Daneshmand, Gomez-Rodriguez, Song, et al. 2014. In ICML.
Descloux, and Sardy. 2018. arXiv:1805.05133 [Stat].
Diaconis, and Freedman. 1984. The Annals of Statistics.
Dossal, Kachour, Fadili, et al. 2011. arXiv:1111.1162 [Cs, Math, Stat].
Efron, Hastie, Johnstone, et al. 2004. βLeast Angle Regression.β The Annals of Statistics.
El Karoui. 2008. University of California, Berkeley.
Elhamifar, and Vidal. 2013. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Engebretsen, and Bohlin. 2019. Clinical Epigenetics.
Ewald, and Schneider. 2015. arXiv:1507.05315 [Math, Stat].
Fan, Rong-En, Chang, Hsieh, et al. 2008. βLIBLINEAR: A Library for Large Linear Classification.β Journal of Machine Learning Research.
Fan, Jianqing, and Li. 2001. Journal of the American Statistical Association.
Fan, Jianqing, and Lv. 2010. Statistica Sinica.
Flynn, Hurvich, and Simonoff. 2013. arXiv:1302.2068 [Stat].
Foygel, and Srebro. 2011. arXiv:1108.0373 [Math, Stat].
Friedman, Hastie, HΓΆfling, et al. 2007. The Annals of Applied Statistics.
Friedman, Hastie, and Tibshirani. 2008. Biostatistics.
Fu, and Zhou. 2013. Journal of the American Statistical Association.
Gasso, Rakotomamonjy, and Canu. 2009. IEEE Transactions on Signal Processing.
Ghadimi, and Lan. 2013a. SIAM Journal on Optimization.
βββ. 2013b. arXiv:1310.3787 [Math].
Girolami. 2001. Neural Computation.
Giryes, Sapiro, and Bronstein. 2014. arXiv:1412.5896 [Cs, Math, Stat].
Greenhill, Isaev, Kwan, et al. 2016. arXiv:1606.01586 [Math].
Gu, Fu, and Zhou. 2014. arXiv:1403.2310 [Stat].
Gui, and Li. 2005. Bioinformatics.
Gupta, and Pensky. 2016. arXiv:1605.07913 [Math, Stat].
Hallac, Leskovec, and Boyd. 2015. arXiv:1507.00280 [Cs, Math, Stat].
Hall, Jin, and Miller. 2014. Bernoulli.
Hall, and Xue. 2014. Computational Statistics & Data Analysis.
Hansen, Reynaud-Bouret, and Rivoirard. 2015. Bernoulli.
Hastie, Tibshirani, Rob, and Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations.
Hawe, Kleinsteuber, and Diepold. 2013. IEEE Transactions on Image Processing.
Hebiri, and van de Geer. 2011. Electronic Journal of Statistics.
Hegde, and Baraniuk. 2012. IEEE Transactions on Information Theory.
Hegde, Indyk, and Schmidt. 2015. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15).
He, Rish, and Parida. 2014. βTransductive HSIC Lasso.β In Proceedings of the 2014 SIAM International Conference on Data Mining. Proceedings.
Hesterberg, Choi, Meier, et al. 2008. Statistics Surveys.
Hirose, Tateishi, and Konishi. 2011. arXiv:1109.2411 [Stat].
Hormati, Roy, Lu, et al. 2010. IEEE Transactions on Signal Processing.
Hsieh, Sustik, Dhillon, et al. 2014. Journal of Machine Learning Research.
Huang, Cheang, and Barron. 2008.
Hu, Pehlevan, and Chklovskii. 2014. In 2014 48th Asilomar Conference on Signals, Systems and Computers.
Ishwaran, and Rao. 2005. The Annals of Statistics.
JankovΓ‘, and van de Geer. 2016. arXiv:1610.01353 [Math, Stat].
Janson, Fithian, and Hastie. 2015. Biometrika.
Javanmard, and Montanari. 2014. Journal of Machine Learning Research.
Jung. 2013. In Advances in Neural Information Processing Systems 29.
KabΓ‘n. 2014. In Journal of Machine Learning Research.
Kato. 2009. Journal of Multivariate Analysis.
Kim, Kwon, and Choi. 2012. Journal of Machine Learning Research.
Koltchinskii. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics Γcole dβΓtΓ© de ProbabilitΓ©s de Saint-Flour 2033.
Koppel, Warnell, Stump, et al. 2016. arXiv:1612.04111 [Cs, Stat].
Kowalski, and TorrΓ©sani. 2009. In SPARSβ09-Signal Processing with Adaptive Sparse Structured Representations.
KrΓ€mer, SchΓ€fer, and Boulesteix. 2009. BMC Bioinformatics.
Lambert-Lacroix, and Zwald. 2011. Electronic Journal of Statistics.
Lam, and Fan. 2009. Annals of Statistics.
Langford, Li, and Zhang. 2009. In Advances in Neural Information Processing Systems 21.
Lederer, and Vogt. 2020. arXiv:2004.11554 [Stat].
Lee, Sun, Sun, et al. 2013. arXiv:1311.6238 [Math, Stat].
Lemhadri, Ruan, Abraham, et al. 2021. Journal of Machine Learning Research.
Li, and Lederer. 2019. Journal of Statistical Planning and Inference.
Lim, and Lederer. 2016. arXiv:1609.07195 [Stat].
Lockhart, Taylor, Tibshirani, et al. 2014. The Annals of Statistics.
Lu, Goldberg, and Fine. 2012. Biometrika.
Lundberg, and Lee. 2017. In Advances in Neural Information Processing Systems.
Mahoney. 2016. arXiv Preprint arXiv:1608.04845.
Mairal. 2015. SIAM Journal on Optimization.
Mazumder, Friedman, and Hastie. 2009.
Meier, van de Geer, and BΓΌhlmann. 2008. Group.
Meinshausen, and BΓΌhlmann. 2006. The Annals of Statistics.
Meinshausen, and Yu. 2009. The Annals of Statistics.
Molchanov, Ashukha, and Vetrov. 2017. In Proceedings of ICML.
Montanari. 2012. Compressed Sensing: Theory and Applications.
Mousavi, and Baraniuk. 2017. In ICASSP.
MΓΌller, and van de Geer. 2015. TEST.
Naik, and Tsai. 2001. Biometrika.
Nam, and Gribonval. 2012. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Needell, and Tropp. 2008. arXiv:0803.2392 [Cs, Math].
Nesterov. 2012. Mathematical Programming.
Neville, Ormerod, and Wand. 2014. Electronic Journal of Statistics.
Ngiam, Chen, Bhaskar, et al. 2011. βSparse Filtering.β In Advances in Neural Information Processing Systems 24.
Nickl, and van de Geer. 2013. The Annals of Statistics.
Oymak, Jalali, Fazel, et al. 2013. In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC).
Peleg, Eldar, and Elad. 2010. IEEE Transactions on Signal Processing.
Portnoy, and Koenker. 1997. Statistical Science.
Pouget-Abadie, and Horel. 2015. In Proceedings of The 32nd International Conference on Machine Learning.
Qian, and Yang. 2012. Annals of the Institute of Statistical Mathematics.
Qin, Scheinberg, and Goldfarb. 2013. Mathematical Programming Computation.
Rahimi, and Recht. 2009. In Advances in Neural Information Processing Systems.
Ravikumar, Wainwright, Raskutti, et al. 2011. Electronic Journal of Statistics.
Ravishankar, Saiprasad, and Bresler. 2015. arXiv:1501.02923 [Cs, Stat].
Ravishankar, S., and Bresler. 2015. IEEE Transactions on Signal Processing.
Reynaud-Bouret. 2003. Probability Theory and Related Fields.
Reynaud-Bouret, and Schbath. 2010. The Annals of Statistics.
Ribeiro, Singh, and Guestrin. 2016. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD β16.
Rish, and Grabarnik. 2014. In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Rish, and Grabarnik. 2015. Sparse Modeling: Theory, Algorithms, and Applications. Chapman & Hall/CRC Machine Learning & Pattern Recognition Series.
RoΔkovΓ‘, and George. 2018. βThe Spike-and-Slab LASSO.β Journal of the American Statistical Association.
Sashank J. Reddi, Suvrit Sra, BarnabΓ‘s PΓ³czΓ³s, et al. 1995.
Schelldorfer, BΓΌhlmann, and van de Geer. 2011. Scandinavian Journal of Statistics.
Semenova, Rudin, and Parr. 2021. arXiv:1908.01755 [Cs, Stat].
Shen, and Huang. 2006. Journal of the American Statistical Association.
Shen, Huang, and Ye. 2004. Technometrics.
Shen, and Ye. 2002. βAdaptive Model Selection.β Journal of the American Statistical Association.
She, and Owen. 2010.
Simon, Friedman, Hastie, et al. 2011. Journal of Statistical Software.
Smith, Forte, Jordan, et al. 2015. arXiv:1512.04011 [Cs].
Soh, and Chandrasekaran. 2017. arXiv:1701.01207 [Cs, Math, Stat].
Soltani, and Hegde. 2016. Statistics.
Starck, Elad, and Donoho. 2005. IEEE Transactions on Image Processing.
Stine. 2004. The Annals of Statistics.
Su, Bogdan, and CandΓ¨s. 2015. arXiv:1511.01957 [Cs, Math, Stat].
Tarr, MΓΌller, and Welsh. 2018. Journal of Statistical Software.
Thisted. 1997. Statistical Science.
Thrampoulidis, Abbasi, and Hassibi. 2015. In Advances in Neural Information Processing Systems 28.
Tibshirani, Robert. 1996. Journal of the Royal Statistical Society. Series B (Methodological).
βββ. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Tibshirani, Ryan J. 2014. arXiv:1408.5801 [Stat].
Trofimov, and Genkin. 2015. In Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science 542.
βββ. 2016. arXiv:1611.02101 [Cs, Stat].
Tropp, and Wright. 2010. Proceedings of the IEEE.
Tschannen, and BΓΆlcskei. 2016. arXiv:1612.03450 [Cs, Math, Stat].
Uematsu. 2015. arXiv:1504.06706 [Math, Stat].
Unser, Michael A., and Tafti. 2014. An Introduction to Sparse Stochastic Processes.
Unser, M., Tafti, Amini, et al. 2014. IEEE Transactions on Information Theory.
Unser, M., Tafti, and Sun. 2014. IEEE Transactions on Information Theory.
Geer, Sara van de. 2007. βThe Deterministic Lasso.β
Geer, Sara A. van de. 2008. The Annals of Statistics.
Geer, Sara van de. 2014a. Scandinavian Journal of Statistics.
βββ. 2014b. In arXiv:1403.7023 [Math, Stat].
βββ. 2014c. arXiv:1409.8557 [Math, Stat].
βββ. 2016. Estimation and Testing Under Sparsity. Lecture Notes in Mathematics.
Geer, Sara van de, BΓΌhlmann, Ritov, et al. 2014. The Annals of Statistics.
Geer, Sara A. van de, BΓΌhlmann, and Zhou. 2011. Electronic Journal of Statistics.
Veitch, and Roy. 2015. arXiv:1512.03099 [Cs, Math, Stat].
Wahba. 1990. Spline Models for Observational Data.
Wang, Zhangyang, Chang, Ling, et al. 2016. In.
Wang, L., Gordon, and Zhu. 2006. In Sixth International Conference on Data Mining (ICDMβ06).
Wang, Hansheng, Li, and Jiang. 2007. Journal of Business & Economic Statistics.
Wisdom, Powers, Pitton, et al. 2016. In Advances in Neural Information Processing Systems 29.
Woodworth, and Chartrand. 2015. arXiv:1504.02923 [Cs, Math].
Wright, Nowak, and Figueiredo. 2009. IEEE Transactions on Signal Processing.
Wu, and Lange. 2008. The Annals of Applied Statistics.
Xu, Caramanis, and Mannor. 2010. βRobust Regression and Lasso.β IEEE Transactions on Information Theory.
βββ. 2012. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Yaghoobi, Nam, Gribonval, et al. 2012. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Yang, and Xu. 2013. In ICML (3).
Yoshida, and West. 2010. Journal of Machine Learning Research.
Yuan, and Lin. 2006. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
βββ. 2007. Biometrika.
Yun, and Toh. 2009. Computational Optimization and Applications.
Zhang, Cun-Hui. 2010. The Annals of Statistics.
Zhang, Yiyun, Li, and Tsai. 2010. Journal of the American Statistical Association.
Zhang, Lijun, Yang, Jin, et al. 2015. arXiv:1511.03766 [Cs].
Zhang, Cun-Hui, and Zhang. 2014. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zhao, Tuo, Liu, and Zhang. 2018. The Annals of Statistics.
Zhao, Peng, Rocha, and Yu. 2006.
βββ. 2009. The Annals of Statistics.
Zhao, Peng, and Yu. 2006. Journal of Machine Learning Research.
Zhou, Tao, and Wu. 2011. Data Mining and Knowledge Discovery.
Zou. 2006. Journal of the American Statistical Association.
Zou, and Hastie. 2005. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zou, Hastie, and Tibshirani. 2007. The Annals of Statistics.
Zou, and Li. 2008. The Annals of Statistics.