Sparse regression

Penalised regression where the penalties are sparsifying. The prediction losses could be anything – likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have to choose clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and measurement error.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

🏗 disambiguate the optimisation technologies at play – iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.


Quadratic loss penalty, absolute coefficient penalty. We estimate the regression coefficients \(\beta\) by solving

\[\begin{aligned} \hat{\beta} = \underset{\beta \in \mathbb{R}^p}{\text{argmin}} \: \frac{1}{2} \| y - {\bf X} \beta \|_2^2 + \lambda \| \beta \|_1, \end{aligned}\]

The penalty coefficient \(\lambda\) is left for you to choose, but one of the magical properties of the lasso is that it is very easy to test many possible values of \(\lambda\) at low marginal cost.

Popular because, amongst other reasons, it turns out to be in practice very fast and convenient, and due to various nifty hacks to speed it up e.g. aggressive approximate variable selection.

Adaptive LASSO

🏗 This is the one with famous oracle properties if you choose \(\lambda\) correctly. Hsi Zou’s paper on this (Zou 2006) is very readable. I am having trouble digesting Sara van de Geer’s paper (van de Geer 2008) on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, but with far more general assumptions on the model and loss functions, and some finite sample guarnatees.


A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.


As used in graphical models. 🏗

Elastic net

Combination of \(L_1\) and \(L_2\) penalties. 🏗

Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See (Yuan and Lin 2006).

Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

Debiased LASSO

There exist a few versions, but the one I have needed is (van de Geer 2008), section 2.1. See also and (S. van de Geer 2014b). (🏗 relation to (van de Geer 2008)?)

Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing. (Wisdom et al. 2016)

Other coefficient penalties

Put a weird penalty on the coefficients! E.g. “Smoothly Clipped Absolute Deviation” (SCAD). 🏗

Other prediction losses

Put a weird penalty on the error! MAD prediction penalty, lasso-coefficient penalty, etc.

See (Wang, Li, and Jiang 2007; Portnoy and Koenker 1997) for some implementations using e.g. maximum absolute prediction error.

Bayesian Lasso

See Bayesian sparsity.


Hastie, Friedman eta’s glmnet for R is fast and well-regarded, and has a MATLAB version. Here’s how to use it for adaptive lasso. Kenneth Tay has implemented elasticnet penalty for any GLM in glmnet.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. It’s an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.


Sparse regression as a universal classifier explainer? Local Interpretable Model-agnostic Explanations (Ribeiro, Singh, and Guestrin 2016) uses LASSO for model interpretation this. (See the blog post, or the source.

Abramovich, Felix, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone. 2006. “Adapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics 34 (2): 584–653.

Aghasi, Alireza, Nam Nguyen, and Justin Romberg. 2016. “Net-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks,” November.

Aragam, Bryon, Arash A. Amini, and Qing Zhou. 2015. “Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression,” November.

Azizyan, Martin, Akshay Krishnamurthy, and Aarti Singh. 2015. “Extreme Compressive Sampling for Covariance Estimation,” June.

Bach, Francis. 2009. “Model-Consistent Sparse Estimation Through the Bootstrap.” arXiv:0901.3202 [Cs, Stat].

Bach, Francis, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. 2012. “Optimization with Sparsity-Inducing Penalties.” Foundations and Trends® in Machine Learning 4 (1): 1–106.

Bahmani, Sohail, and Justin Romberg. 2014. “Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation,” December.

Banerjee, Arindam, Sheng Chen, Farideh Fazayeli, and Vidyashankar Sivakumar. 2014. “Estimation with Norm Regularization.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 1556–64. Curran Associates, Inc.

Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont. 2008. “Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research 9 (Mar): 485–516.

Barber, Rina Foygel, and Emmanuel J. Candès. 2015. “Controlling the False Discovery Rate via Knockoffs.” The Annals of Statistics 43 (5): 2055–85.

Barbier, Jean. 2015. “Statistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory,” November.

Baron, Dror, Shriram Sarvotham, and Richard G. Baraniuk. 2010. “Bayesian Compressive Sensing via Belief Propagation.” IEEE Transactions on Signal Processing 58 (1): 269–80.

Barron, Andrew R., Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. 2008. “Approximation and Learning by Greedy Algorithms.” The Annals of Statistics 36 (1): 64–94.

Barron, Andrew R., Cong Huang, Jonathan Q. Li, and Xi Luo. 2008. “MDL, Penalized Likelihood, and Statistical Risk.” In Information Theory Workshop, 2008. ITW’08. IEEE, 247–57. IEEE.

Battiti, Roberto. 1992. “First-and Second-Order Methods for Learning: Between Steepest Descent and Newton’s Method.” Neural Computation 4 (2): 141–66.

Bayati, M., and A. Montanari. 2012. “The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory 58 (4): 1997–2017.

Bellec, Pierre C., and Alexandre B. Tsybakov. 2016. “Bounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty,” September.

Belloni, Alexandre, Victor Chernozhukov, and Lie Wang. 2011. “Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.” Biometrika 98 (4): 791–806.

Bian, Wei, Xiaojun Chen, and Yinyu Ye. 2014. “Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization.” Mathematical Programming 149 (1-2): 301–27.

Bien, Jacob, Irina Gaynanova, Johannes Lederer, and Christian Müller. 2016. “Non-Convex Global Minimization and False Discovery Rate Control for the TREX,” April.

Bien, Jacob, Irina Gaynanova, Johannes Lederer, and Christian L. Müller. 2018. “Non-Convex Global Minimization and False Discovery Rate Control for the TREX.” Journal of Computational and Graphical Statistics 27 (1): 23–33.

Bloniarz, Adam, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, and Bin Yu. 2015. “Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments,” July.

Bondell, Howard D., Arun Krishna, and Sujit K. Ghosh. 2010. “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics 66 (4): 1069–77.

Borgs, Christian, Jennifer T. Chayes, Henry Cohn, and Yufei Zhao. 2014. “An $Lp$ Theory of Sparse Graph Convergence I: Limits, Sparse Random Graph Models, and Power Law Distributions,” January.

Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. 2016. “Optimization Methods for Large-Scale Machine Learning,” June.

Breiman, Leo. 1995. “Better Subset Regression Using the Nonnegative Garrote.” Technometrics 37 (4): 373–84.

Bruckstein, A. M., Michael Elad, and M. Zibulevsky. 2008. “On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.” IEEE Transactions on Information Theory 54 (11): 4813–20.

Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. 2016. “Discovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 113 (15): 3932–7.

Bu, Yunqi, and Johannes Lederer. 2017. “Integrating Additional Knowledge into Estimation of Graphical Models,” April.

Bühlmann, Peter, and Sara van de Geer. 2011. “Additive Models and Many Smooth Univariate Functions.” In Statistics for High-Dimensional Data, 77–97. Springer Series in Statistics. Springer Berlin Heidelberg.

———. 2015. “High-Dimensional Inference in Misspecified Linear Models” 9 (1): 1449–73.

Candès, Emmanuel J., and Mark A. Davenport. 2011. “How Well Can We Estimate a Sparse Vector?” April.

Candès, Emmanuel J., Yingying Fan, Lucas Janson, and Jinchi Lv. 2016. “Panning for Gold: Model-Free Knockoffs for High-Dimensional Controlled Variable Selection.” arXiv Preprint arXiv:1610.02351.

Candès, Emmanuel J., and Carlos Fernandez-Granda. 2013. “Super-Resolution from Noisy Data.” Journal of Fourier Analysis and Applications 19 (6): 1229–54.

Candès, Emmanuel J., and Y. Plan. 2010. “Matrix Completion with Noise.” Proceedings of the IEEE 98 (6): 925–36.

Candès, Emmanuel J., Justin K. Romberg, and Terence Tao. 2006. “Stable Signal Recovery from Incomplete and Inaccurate Measurements.” Communications on Pure and Applied Mathematics 59 (8): 1207–23.

Candès, Emmanuel J., Michael B. Wakin, and Stephen P. Boyd. 2008. “Enhancing Sparsity by Reweighted ℓ 1 Minimization.” Journal of Fourier Analysis and Applications 14 (5-6): 877–905.

Carmi, Avishy Y. 2013. “Compressive System Identification: Sequential Methods and Entropy Bounds.” Digital Signal Processing 23 (3): 751–70.

———. 2014. “Compressive System Identification.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 281–324. Signals and Communication Technology. Springer Berlin Heidelberg.

Cevher, Volkan, Marco F. Duarte, Chinmay Hegde, and Richard Baraniuk. 2009. “Sparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems, 257–64. Curran Associates, Inc.

Chartrand, R., and Wotao Yin. 2008. “Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 3869–72.

Chen, Minhua, J. Silva, J. Paisley, Chunping Wang, D. Dunson, and L. Carin. 2010. “Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.” IEEE Transactions on Signal Processing 58 (12): 6140–55.

Chen, Xiaojun. 2012. “Smoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming 134 (1): 71–99.

Chen, Yen-Chi, and Yu-Xiang Wang. n.d. “Discussion on ‘Confidence Intervals and Hypothesis Testing for High-Dimensional Regression’.” Accessed July 12, 2015.

Chen, Y., and A. O. Hero. 2012. “Recursive ℓ1,∞ Group Lasso.” IEEE Transactions on Signal Processing 60 (8): 3978–87.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2016. “Double/Debiased Machine Learning for Treatment and Causal Parameters,” July.

Chernozhukov, Victor, Christian Hansen, Yuan Liao, and Yinchu Zhu. 2018. “Inference for Heterogeneous Effects Using Low-Rank Estimations,” December.

Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh. 2018. “Learning L2 Continuous Regression Functionals via Regularized Riesz Representers,” September.

Chetverikov, Denis, Zhipeng Liao, and Victor Chernozhukov. 2016. “On Cross-Validated Lasso,” May.

Chichignoud, Michaël, Johannes Lederer, and Martin Wainwright. 2014. “A Practical Scheme and Fast Algorithm to Tune the Lasso with Optimality Guarantees,” October.

Dai, Ran, and Rina Foygel Barber. 2016. “The Knockoff Filter for FDR Control in Group-Sparse and Multitask Regression.” arXiv Preprint arXiv:1602.03589.

Daneshmand, Hadi, Manuel Gomez-Rodriguez, Le Song, and Bernhard Schölkopf. 2014. “Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-Thresholding Algorithm.” In ICML.

Descloux, Pascaline, and Sylvain Sardy. 2018. “Model Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles,” May.

Diaconis, Persi, and David Freedman. 1984. “Asymptotics of Graphical Projection Pursuit.” The Annals of Statistics 12 (3): 793–815.

Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. “Least Angle Regression.” The Annals of Statistics 32 (2): 407–99.

Elhamifar, E., and R. Vidal. 2013. “Sparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11): 2765–81.

Ewald, Karl, and Ulrike Schneider. 2015. “Confidence Sets Based on the Lasso Estimator,” July.

Fan, Jianqing, and Runze Li. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60.

Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. “LIBLINEAR: A Library for Large Linear Classification.” Journal of Machine Learning Research 9: 1871–4.

Flynn, Cheryl J., Clifford M. Hurvich, and Jeffrey S. Simonoff. 2013. “Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models,” February.

Foygel, Rina, and Nathan Srebro. 2011. “Fast-Rate and Optimistic-Rate Error Bounds for L1-Regularized Regression,” August.

Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. “Pathwise Coordinate Optimization.” The Annals of Applied Statistics 1 (2): 302–32.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.

Fu, Fei, and Qing Zhou. 2013. “Learning Sparse Causal Gaussian Networks with Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association 108 (501): 288–300.

Gasso, G., A. Rakotomamonjy, and S. Canu. 2009. “Recovering Sparse Signals with a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing 57 (12): 4686–98.

Geer, Sara van de. 2007. “The Deterministic Lasso.”

———. 2014a. “Weakly Decomposable Regularization Penalties and Structured Sparsity.” Scandinavian Journal of Statistics 41 (1): 72–86.

———. 2014b. “Worst Possible Sub-Directions in High-Dimensional Models.” In. Vol. 131.

———. 2014c. “Statistical Theory for High-Dimensional Models,” September.

———. 2016. Estimation and Testing Under Sparsity. Vol. 2159. Lecture Notes in Mathematics. Cham: Springer International Publishing.

Geer, Sara A. van de. 2008. “High-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics 36 (2): 614–45.

Geer, Sara A. van de, Peter Bühlmann, and Shuheng Zhou. 2011. “The Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics 5: 688–749.

Geer, Sara van de, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure. 2014. “On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models.” The Annals of Statistics 42 (3): 1166–1202.

Ghadimi, Saeed, and Guanghui Lan. 2013a. “Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.” SIAM Journal on Optimization 23 (4): 2341–68.

———. 2013b. “Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming,” October.

Girolami, Mark. 2001. “A Variational Method for Learning Sparse and Overcomplete Representations.” Neural Computation 13 (11): 2517–32.

Giryes, Raja, Guillermo Sapiro, and Alex M. Bronstein. 2014. “On the Stability of Deep Networks,” December.

Greenhill, Catherine, Mikhail Isaev, Matthew Kwan, and Brendan D. McKay. 2016. “The Average Number of Spanning Trees in Sparse Graphs with Given Degrees,” June.

Gu, Jiaying, Fei Fu, and Qing Zhou. 2014. “Adaptive Penalized Estimation of Directed Acyclic Graphs from Categorical Data,” March.

Gui, Jiang, and Hongzhe Li. 2005. “Penalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Applications to Microarray Gene Expression Data.” Bioinformatics 21 (13): 3001–8.

Gupta, Pawan, and Marianna Pensky. 2016. “Solution of Linear Ill-Posed Problems Using Random Dictionaries,” May.

Hallac, David, Jure Leskovec, and Stephen Boyd. 2015. “Network Lasso: Clustering and Optimization in Large Graphs,” July.

Hansen, Niels Richard, Patricia Reynaud-Bouret, and Vincent Rivoirard. 2015. “Lasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli 21 (1): 83–143.

Hastie, Trevor J., Rob Tibshirani, and Martin J. Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton: Chapman and Hall/CRC.

Hawe, S., M. Kleinsteuber, and K. Diepold. 2013. “Analysis Operator Learning and Its Application to Image Reconstruction.” IEEE Transactions on Image Processing 22 (6): 2138–50.

He, Dan, Irina Rish, and Laxmi Parida. 2014. “Transductive HSIC Lasso.” In Proceedings of the 2014 SIAM International Conference on Data Mining, edited by Mohammed Zaki, Zoran Obradovic, Pang Ning Tan, Arindam Banerjee, Chandrika Kamath, and Srinivasan Parthasarathy, 154–62. Proceedings. Philadelphia, PA: Society for Industrial and Applied Mathematics.

Hebiri, Mohamed, and Sara A. van de Geer. 2011. “The Smooth-Lasso and Other ℓ1+ℓ2-Penalized Methods.” Electronic Journal of Statistics 5: 1184–1226.

Hegde, Chinmay, and Richard G. Baraniuk. 2012. “Signal Recovery on Incoherent Manifolds.” IEEE Transactions on Information Theory 58 (12): 7204–14.

Hegde, Chinmay, Piotr Indyk, and Ludwig Schmidt. 2015. “A Nearly-Linear Time Framework for Graph-Structured Sparsity.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 928–37.

Hesterberg, Tim, Nam Hee Choi, Lukas Meier, and Chris Fraley. 2008. “Least Angle and ℓ1 Penalized Regression: A Review.” Statistics Surveys 2: 61–93.

Hormati, A., O. Roy, Y.M. Lu, and M. Vetterli. 2010. “Distributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications.” IEEE Transactions on Signal Processing 58 (3): 1095–1109.

Hsieh, Cho-Jui, Mátyás A. Sustik, Inderjit S. Dhillon, and Pradeep D. Ravikumar. 2014. “QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.” Journal of Machine Learning Research 15 (1): 2911–47.

Hu, Tao, Cengiz Pehlevan, and Dmitri B. Chklovskii. 2014. “A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.

Huang, Cong, G. L. H. Cheang, and Andrew R. Barron. 2008. “Risk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”

Ishwaran, Hemant, and J. Sunil Rao. 2005. “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics 33 (2): 730–73.

Janson, Lucas, William Fithian, and Trevor J. Hastie. 2015. “Effective Degrees of Freedom: A Flawed Metaphor.” Biometrika 102 (2): 479–85.

Javanmard, Adel, and Andrea Montanari. 2014. “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research 15 (1): 2869–2909.

Jung, Alexander. 2013. “An RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29.

Kabán, Ata. 2014. “New Bounds on Compressive Linear Least Squares Regression.” In Journal of Machine Learning Research, 448–56.

Koppel, Alec, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro. 2016. “Parsimonious Online Learning with Kernels via Sparse Projections in Function Space,” December.

Kowalski, Matthieu, and Bruno Torrésani. 2009. “Structured Sparsity: From Mixed Norms to Structured Shrinkage.” In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations.

Krämer, Nicole, Juliane Schäfer, and Anne-Laure Boulesteix. 2009. “Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics 10 (1): 384.

Lam, Clifford, and Jianqing Fan. 2009. “Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics 37 (6B): 4254–78.

Lambert-Lacroix, Sophie, and Laurent Zwald. 2011. “Robust Regression Through the Huber’s Criterion and Adaptive Lasso Penalty.” Electronic Journal of Statistics 5: 1015–53.

Langford, John, Lihong Li, and Tong Zhang. 2009. “Sparse Online Learning via Truncated Gradient.” In Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 905–12. Curran Associates, Inc.

Lederer, Johannes, and Michael Vogt. 2020. “Estimating the Lasso’s Effective Noise,” April.

Lee, Jason D., Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor. 2013. “Exact Post-Selection Inference, with Application to the Lasso,” November.

Lim, Néhémy, and Johannes Lederer. 2016. “Efficient Feature Selection with Large and High-Dimensional Data,” September.

Lockhart, Richard, Jonathan Taylor, Ryan J. Tibshirani, and Robert Tibshirani. 2014. “A Significance Test for the Lasso.” The Annals of Statistics 42 (2): 413–68.

LU, W., Y. GOLDBERG, and J. P. FINE. 2012. “On the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika 99 (3): 717–31.

Mahoney, Michael W. 2016. “Lecture Notes on Spectral Graph Methods.” arXiv Preprint arXiv:1608.04845.

Mairal, J. 2015. “Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning.” SIAM Journal on Optimization 25 (2): 829–55.

Mazumder, Rahul, Jerome H Friedman, and Trevor J. Hastie. 2009. “SparseNet: Coordinate Descent with Non-Convex Penalties.” Stanford University.

Meier, Lukas, Sara van de Geer, and Peter Bühlmann. 2008. “The Group Lasso for Logistic Regression.” Group 70 (Part 1): 53–71.

Meinshausen, Nicolai, and Peter Bühlmann. 2006. “High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics 34 (3): 1436–62.

Meinshausen, Nicolai, and Bin Yu. 2009. “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics 37 (1): 246–70.

Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML.

Montanari, Andrea. 2012. “Graphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications, 394–438.

Mousavi, Ali, and Richard G. Baraniuk. 2017. “Learning to Invert: Signal Recovery via Deep Convolutional Networks.” In ICASSP.

Müller, Patric, and Sara van de Geer. 2015. “Censored Linear Model in High Dimensions: Penalised Linear Regression on High-Dimensional Data with Left-Censored Response Variable.” TEST, April.

Nam, Sangnam, and R. Gribonval. 2012. “Physics-Driven Structured Cosparse Modeling for Source Localization.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5397–5400.

Needell, D., and J. A. Tropp. 2008. “CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples,” March.

Nesterov, Yu. 2012. “Gradient Methods for Minimizing Composite Functions.” Mathematical Programming 140 (1): 125–61.

Neville, Sarah E., John T. Ormerod, and M. P. Wand. 2014. “Mean Field Variational Bayes for Continuous Sparse Signal Shrinkage: Pitfalls and Remedies.” Electronic Journal of Statistics 8 (1): 1113–51.

Ngiam, Jiquan, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, and Andrew Y. Ng. 2011. “Sparse Filtering.” In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 1125–33. Curran Associates, Inc.

Nickl, Richard, and Sara van de Geer. 2013. “Confidence Sets in Sparse Regression.” The Annals of Statistics 41 (6): 2852–76.

Oymak, S., A. Jalali, M. Fazel, and B. Hassibi. 2013. “Noisy Estimation of Simultaneously Structured Models: Limitations of Convex Relaxation.” In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC), 6019–24.

Peleg, Tomer, Yonina C. Eldar, and Michael Elad. 2010. “Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery.” IEEE Transactions on Signal Processing 60 (5): 2286–2303.

Portnoy, Stephen, and Roger Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300.

Pouget-Abadie, Jean, and Thibaut Horel. 2015. “Inferring Graphs from Cascades: A Sparse Recovery Framework.” In Proceedings of the 32nd International Conference on Machine Learning.

Pourahmadi, Mohsen. 2011. “Covariance Estimation: The GLM and Regularization Perspectives.” Statistical Science 26 (3): 369–87.

Qian, Wei, and Yuhong Yang. 2012. “Model Selection via Standard Error Adjusted Adaptive Lasso.” Annals of the Institute of Statistical Mathematics 65 (2): 295–318.

Qin, Zhiwei, Katya Scheinberg, and Donald Goldfarb. 2013. “Efficient Block-Coordinate Descent Algorithms for the Group Lasso.” Mathematical Programming Computation 5 (2): 143–69.

Rahimi, Ali, and Benjamin Recht. 2009. “Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems, 1313–20. Curran Associates, Inc.

Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. 2011. “High-Dimensional Covariance Estimation by Minimizing ℓ1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics 5: 935–80.

Ravishankar, Saiprasad, and Yoram Bresler. 2015. “Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI,” January.

Ravishankar, S., and Y. Bresler. 2015. “Sparsifying Transform Learning with Efficient Optimal Updates and Convergence Guarantees.” IEEE Transactions on Signal Processing 63 (9): 2389–2404.

Reynaud-Bouret, Patricia. 2003. “Adaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields 126 (1).

Reynaud-Bouret, Patricia, and Sophie Schbath. 2010. “Adaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics 38 (5): 2781–2822.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. KDD ’16. New York, NY, USA: ACM.

Rish, Irina, and Genady Grabarnik. 2014. “Sparse Signal Recovery with Exponential-Family Noise.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 77–93. Signals and Communication Technology. Springer Berlin Heidelberg.

Rish, Irina, and Genady Ya Grabarnik. 2015. Sparse Modeling: Theory, Algorithms, and Applications. Chapman & Hall/CRC Machine Learning & Pattern Recognition Series. Boca Raton, FL: CRC Press, Taylor & Francis Group.

Ročková, Veronika, and Edward I. George. 2018. “The Spike-and-Slab LASSO.” Journal of the American Statistical Association 113 (521): 431–44.

Sashank J. Reddi, Suvrit Sra, Barnabás Póczós, and Alex Smola. 1995. “Stochastic Frank-Wolfe Methods for Nonconvex Optimization.”

Schelldorfer, Jürg, Peter Bühlmann, and Sara Van De Geer. 2011. “Estimation for High-Dimensional Linear Mixed-Effects Models Using ℓ1-Penalization.” Scandinavian Journal of Statistics 38 (2): 197–214.

She, Yiyuan, and Art B. Owen. 2010. “Outlier Detection Using Nonconvex Penalized Regression.”

Simon, Noah, Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2011. “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software 39 (5).

Smith, Virginia, Simone Forte, Michael I. Jordan, and Martin Jaggi. 2015. “L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework,” December.

Soh, Yong Sheng, and Venkat Chandrasekaran. 2017. “A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers,” January.

Soltani, Mohammadreza, and Chinmay Hegde. 2016. “Demixing Sparse Signals from Nonlinear Observations.” Statistics 7: 9.

Starck, J. L., Michael Elad, and David L. Donoho. 2005. “Image Decomposition via the Combination of Sparse Representations and a Variational Approach.” IEEE Transactions on Image Processing 14 (10): 1570–82.

Stine, Robert A. 2004. “Discussion of "Least Angle Regression" by Efron et Al.” The Annals of Statistics 32 (2): 407–99.

Su, Weijie, Malgorzata Bogdan, and Emmanuel J. Candès. 2015. “False Discoveries Occur Early on the Lasso Path,” November.

Taddy, Matt. 2013. “One-Step Estimator Paths for Concave Regularization,” August.

Thisted, Ronald A. 1997. “[The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators]: Comment.” Statistical Science 12 (4): 296–98.

Thrampoulidis, Chrtistos, Ehsan Abbasi, and Babak Hassibi. 2015. “LASSO with Non-Linear Measurements Is Equivalent to One with Linear Measurements.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, 3402–10. Curran Associates, Inc.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267–88.

———. 2011. “Regression Shrinkage and Selection via the Lasso: A Retrospective.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (3): 273–82.

Tibshirani, Ryan J. 2014. “A General Framework for Fast Stagewise Algorithms,” August.

Trofimov, Ilya, and Alexander Genkin. 2015. “Distributed Coordinate Descent for L1-Regularized Logistic Regression.” In Analysis of Images, Social Networks and Texts, edited by Mikhail Yu Khachay, Natalia Konstantinova, Alexander Panchenko, Dmitry I. Ignatov, and Valeri G. Labunets, 243–54. Communications in Computer and Information Science 542. Springer International Publishing.

———. 2016. “Distributed Coordinate Descent for Generalized Linear Models with Regularization,” November.

Tropp, J. A., and S. J. Wright. 2010. “Computational Methods for Sparse Solution of Linear Inverse Problems.” Proceedings of the IEEE 98 (6): 948–58.

Tschannen, Michael, and Helmut Bölcskei. 2016. “Noisy Subspace Clustering via Matching Pursuits,” December.

Uematsu, Yoshimasa. 2015. “Penalized Likelihood Estimation in High-Dimensional Time Series Models and Its Application,” April.

Unser, Michael A., and Pouya Tafti. 2014. An Introduction to Sparse Stochastic Processes. New York: Cambridge University Press.

Unser, M., P. D. Tafti, A. Amini, and H. Kirshner. 2014. “A Unified Formulation of Gaussian Vs Sparse Stochastic Processes - Part II: Discrete-Domain Theory.” IEEE Transactions on Information Theory 60 (5): 3036–51.

Unser, M., P. D. Tafti, and Q. Sun. 2014. “A Unified Formulation of Gaussian Vs Sparse Stochastic Processes—Part I: Continuous-Domain Theory.” IEEE Transactions on Information Theory 60 (3): 1945–62.

Veitch, Victor, and Daniel M. Roy. 2015. “The Class of Random Graphs Arising from Exchangeable Random Measures,” December.

Wahba, Grace. 1990. Spline Models for Observational Data. SIAM.

Wang, Hansheng, Guodong Li, and Guohua Jiang. 2007. “Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics 25 (3): 347–55.

Wang, L., M. D. Gordon, and J. Zhu. 2006. “Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning.” In Sixth International Conference on Data Mining (ICDM’06), 690–700.

Wang, Zhangyang, Shiyu Chang, Qing Ling, Shuai Huang, Xia Hu, Honghui Shi, and Thomas S. Huang. 2016. “Stacked Approximated Regression Machine: A Simple Deep Learning Approach.” In.

Wisdom, Scott, Thomas Powers, James Pitton, and Les Atlas. 2016. “Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29.

Woodworth, Joseph, and Rick Chartrand. 2015. “Compressed Sensing Recovery via Nonconvex Shrinkage Penalties,” April.

Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. 2009. “Sparse Reconstruction by Separable Approximation.” IEEE Transactions on Signal Processing 57 (7): 2479–93.

Wu, Tong Tong, and Kenneth Lange. 2008. “Coordinate Descent Algorithms for Lasso Penalized Regression.” The Annals of Applied Statistics 2 (1): 224–44.

Xu, H., C. Caramanis, and S. Mannor. 2010. “Robust Regression and Lasso.” IEEE Transactions on Information Theory 56 (7): 3561–74.

Yaghoobi, M., Sangnam Nam, R. Gribonval, and M.E. Davies. 2012. “Noise Aware Analysis Operator Learning for Approximately Cosparse Signals.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5409–12.

Yang, Wenzhuo, and Huan Xu. 2013. “A Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3), 585–93.

Yoshida, Ryo, and Mike West. 2010. “Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98.

Yuan, Ming, and Yi Lin. 2006. “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1): 49–67.

———. 2007. “Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika 94 (1): 19–35.

Yun, Sangwoon, and Kim-Chuan Toh. 2009. “A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications 48 (2): 273–307.

Zhang, Cun-Hui. 2010. “Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics 38 (2): 894–942.

Zhang, Cun-Hui, and Stephanie S. Zhang. 2014. “Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (1): 217–42.

Zhang, Lijun, Tianbao Yang, Rong Jin, and Zhi-Hua Zhou. 2015. “Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach,” November.

Zhao, Peng, Guilherme Rocha, and Bin Yu. 2009. “The Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics 37 (6A): 3468–97.

Zhao, Tuo, Han Liu, and Tong Zhang. 2018. “Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory.” The Annals of Statistics 46 (1): 180–218.

Zhou, Tianyi, Dacheng Tao, and Xindong Wu. 2011. “Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction.” Data Mining and Knowledge Discovery 22 (3): 340–71.

Zou, Hui. 2006. “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101 (476): 1418–29.

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20.

Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2007. “On the ‘Degrees of Freedom’ of the Lasso.” The Annals of Statistics 35 (5): 2173–92.

Zou, Hui, and Runze Li. 2008. “One-Step Sparse Estimates in Nonconcave Penalized Likelihood Models.” The Annals of Statistics 36 (4): 1509–33.