Sparse regression



Penalised regression where the penalties are sparsifying. The prediction losses could be anything β€” likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have to choose clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and measurement error.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

πŸ— disambiguate the optimisation technologies at play β€” iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.

LASSO

Quadratic loss penalty, absolute coefficient penalty. We estimate the regression coefficients \(\beta\) by solving

\[\begin{aligned} \hat{\beta} = \underset{\beta \in \mathbb{R}^p}{\text{argmin}} \: \frac{1}{2} \| y - {\bf X} \beta \|_2^2 + \lambda \| \beta \|_1, \end{aligned}\]

The penalty coefficient \(\lambda\) is left for you to choose, but one of the magical properties of the lasso is that it is easy to test many possible values of \(\lambda\) at low marginal cost.

Popular because, amongst other reasons, it turns out to be in practice fast and convenient, and amenable to various performance accelerations e.g. aggressive approximate variable selection.

Adaptive LASSO

πŸ— This is the one with famous oracle properties if you choose \(\lambda\) correctly. Hsi Zou’s paper on this (Zou 2006) is readable. I am having trouble digesting Sara van de Geer’s paper (S. A. van de Geer 2008) on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, but with far more general assumptions on the model and loss functions, and some finite sample guarnatees.

LARS

A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.

Graph LASSO

As used in graphical models. πŸ—

Elastic net

Combination of \(L_1\) and \(L_2\) penalties. πŸ—

Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See (Yuan and Lin 2006).

Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

Debiased LASSO

There exist a few versions, but the one I have needed is (S. A. van de Geer 2008), section 2.1. See also and (S. van de Geer 2014b). (πŸ— relation to (S. A. van de Geer 2008)?)

Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing. (Wisdom et al. 2016)

Other coefficient penalties

Put a weird penalty on the coefficients! E.g. β€œSmoothly Clipped Absolute Deviation” (SCAD). πŸ—

Other prediction losses

Put a weird penalty on the error! MAD prediction penalty, lasso-coefficient penalty, etc.

See (H. Wang, Li, and Jiang 2007; Portnoy and Koenker 1997) for some implementations using e.g. maximum absolute prediction error.

Bayesian Lasso

See Bayesian sparsity.

Implementations

Hastie, Friedman eta’s glmnet for R is fast and well-regarded, and has a MATLAB version. Here’s how to use it for adaptive lasso. Kenneth Tay has implemented elasticnet penalty for any GLM in glmnet.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. It’s an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.

Tidbits

Sparse regression as a universal classifier explainer? Local Interpretable Model-agnostic Explanations (Ribeiro, Singh, and Guestrin 2016) uses LASSO for model interpretation this. (See the blog post, or the source.

References

Abramovich, Felix, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone. 2006. β€œAdapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics 34 (2): 584–653.
Aghasi, Alireza, Nam Nguyen, and Justin Romberg. 2016. β€œNet-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks.” arXiv:1611.05162 [Cs, Stat], November.
Aragam, Bryon, Arash A. Amini, and Qing Zhou. 2015. β€œLearning Directed Acyclic Graphs with Penalized Neighbourhood Regression.” arXiv:1511.08963 [Cs, Math, Stat], November.
Azizyan, Martin, Akshay Krishnamurthy, and Aarti Singh. 2015. β€œExtreme Compressive Sampling for Covariance Estimation.” arXiv:1506.00898 [Cs, Math, Stat], June.
Bach, Francis. 2009. β€œModel-Consistent Sparse Estimation Through the Bootstrap.” arXiv:0901.3202 [Cs, Stat].
Bach, Francis, Rodolphe Jenatton, and Julien Mairal. 2011. Optimization With Sparsity-Inducing Penalties. Foundations and Trends(r) in Machine Learning 1.0. Now Publishers Inc.
Bahmani, Sohail, and Justin Romberg. 2014. β€œLifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation.” arXiv:1501.00046 [Cs, Math, Stat], December.
Banerjee, Arindam, Sheng Chen, Farideh Fazayeli, and Vidyashankar Sivakumar. 2014. β€œEstimation with Norm Regularization.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 1556–64. Curran Associates, Inc.
Banerjee, Onureena, Laurent El Ghaoui, and Alexandre d’Aspremont. 2008. β€œModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research 9 (Mar): 485–516.
Barber, Rina Foygel, and Emmanuel J. CandΓ¨s. 2015. β€œControlling the False Discovery Rate via Knockoffs.” The Annals of Statistics 43 (5): 2055–85.
Barbier, Jean. 2015. β€œStatistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory.” arXiv:1511.01650 [Cs, Math], November.
Baron, Dror, Shriram Sarvotham, and Richard G. Baraniuk. 2010. β€œBayesian Compressive Sensing via Belief Propagation.” IEEE Transactions on Signal Processing 58 (1): 269–80.
Barron, Andrew R., Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. 2008. β€œApproximation and Learning by Greedy Algorithms.” The Annals of Statistics 36 (1): 64–94.
Barron, Andrew R., Cong Huang, Jonathan Q. Li, and Xi Luo. 2008. β€œMDL, Penalized Likelihood, and Statistical Risk.” In Information Theory Workshop, 2008. ITW’08. IEEE, 247–57. IEEE.
Battiti, Roberto. 1992. β€œFirst-and Second-Order Methods for Learning: Between Steepest Descent and Newton’s Method.” Neural Computation 4 (2): 141–66.
Bayati, M., and A. Montanari. 2012. β€œThe LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory 58 (4): 1997–2017.
Bellec, Pierre C., and Alexandre B. Tsybakov. 2016. β€œBounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty.” arXiv:1609.06675 [Math, Stat], September.
Belloni, Alexandre, Victor Chernozhukov, and Lie Wang. 2011. β€œSquare-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.” Biometrika 98 (4): 791–806.
Bian, Wei, Xiaojun Chen, and Yinyu Ye. 2014. β€œComplexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization.” Mathematical Programming 149 (1-2): 301–27.
Bien, Jacob, Irina Gaynanova, Johannes Lederer, and Christian L. MΓΌller. 2018. β€œNon-Convex Global Minimization and False Discovery Rate Control for the TREX.” Journal of Computational and Graphical Statistics 27 (1): 23–33.
Bloniarz, Adam, Hanzhong Liu, Cun-Hui Zhang, Jasjeet Sekhon, and Bin Yu. 2015. β€œLasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” arXiv:1507.03652 [Math, Stat], July.
Bondell, Howard D., Arun Krishna, and Sujit K. Ghosh. 2010. β€œJoint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics 66 (4): 1069–77.
Borgs, Christian, Jennifer T. Chayes, Henry Cohn, and Yufei Zhao. 2014. β€œAn \(L^p\) Theory of Sparse Graph Convergence I: Limits, Sparse Random Graph Models, and Power Law Distributions.” arXiv:1401.2906 [Math], January.
Bottou, LΓ©on, Frank E. Curtis, and Jorge Nocedal. 2016. β€œOptimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838 [Cs, Math, Stat], June.
Breiman, Leo. 1995. β€œBetter Subset Regression Using the Nonnegative Garrote.” Technometrics 37 (4): 373–84.
Bruckstein, A. M., Michael Elad, and M. Zibulevsky. 2008. β€œOn the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.” IEEE Transactions on Information Theory 54 (11): 4813–20.
Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. 2016. β€œDiscovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 113 (15): 3932–37.
Bu, Yunqi, and Johannes Lederer. 2017. β€œIntegrating Additional Knowledge Into Estimation of Graphical Models.” arXiv:1704.02739 [Stat], April.
BΓΌhlmann, Peter, and Sara van de Geer. 2011. β€œAdditive Models and Many Smooth Univariate Functions.” In Statistics for High-Dimensional Data, 77–97. Springer Series in Statistics. Springer Berlin Heidelberg.
β€”β€”β€”. 2015. β€œHigh-Dimensional Inference in Misspecified Linear Models.” arXiv:1503.06426 [Stat] 9 (1): 1449–73.
CandΓ¨s, Emmanuel J., and Mark A. Davenport. 2011. β€œHow Well Can We Estimate a Sparse Vector?” arXiv:1104.5246 [Cs, Math, Stat], April.
CandΓ¨s, Emmanuel J., Yingying Fan, Lucas Janson, and Jinchi Lv. 2016. β€œPanning for Gold: Model-Free Knockoffs for High-Dimensional Controlled Variable Selection.” arXiv Preprint arXiv:1610.02351.
CandΓ¨s, Emmanuel J., and Carlos Fernandez-Granda. 2013. β€œSuper-Resolution from Noisy Data.” Journal of Fourier Analysis and Applications 19 (6): 1229–54.
CandΓ¨s, Emmanuel J., and Y. Plan. 2010. β€œMatrix Completion With Noise.” Proceedings of the IEEE 98 (6): 925–36.
CandΓ¨s, Emmanuel J., Justin K. Romberg, and Terence Tao. 2006. β€œStable Signal Recovery from Incomplete and Inaccurate Measurements.” Communications on Pure and Applied Mathematics 59 (8): 1207–23.
CandΓ¨s, Emmanuel J., Michael B. Wakin, and Stephen P. Boyd. 2008. β€œEnhancing Sparsity by Reweighted β„“ 1 Minimization.” Journal of Fourier Analysis and Applications 14 (5-6): 877–905.
Carmi, Avishy Y. 2013. β€œCompressive System Identification: Sequential Methods and Entropy Bounds.” Digital Signal Processing 23 (3): 751–70.
β€”β€”β€”. 2014. β€œCompressive System Identification.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 281–324. Signals and Communication Technology. Springer Berlin Heidelberg.
Cevher, Volkan, Marco F. Duarte, Chinmay Hegde, and Richard Baraniuk. 2009. β€œSparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems, 257–64. Curran Associates, Inc.
Chartrand, R., and Wotao Yin. 2008. β€œIteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 3869–72.
Chen, Minhua, J. Silva, J. Paisley, Chunping Wang, D. Dunson, and L. Carin. 2010. β€œCompressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.” IEEE Transactions on Signal Processing 58 (12): 6140–55.
Chen, Xiaojun. 2012. β€œSmoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming 134 (1): 71–99.
Chen, Yen-Chi, and Yu-Xiang Wang. n.d. β€œDiscussion on β€˜Confidence Intervals and Hypothesis Testing for High-Dimensional Regression’.”
Chen, Y., and A. O. Hero. 2012. β€œRecursive β„“1,∞ Group Lasso.” IEEE Transactions on Signal Processing 60 (8): 3978–87.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2016. β€œDouble/Debiased Machine Learning for Treatment and Causal Parameters.” arXiv:1608.00060 [Econ, Stat], July.
Chernozhukov, Victor, Christian Hansen, Yuan Liao, and Yinchu Zhu. 2018. β€œInference For Heterogeneous Effects Using Low-Rank Estimations.” arXiv:1812.08089 [Math, Stat], December.
Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh. 2018. β€œLearning L2 Continuous Regression Functionals via Regularized Riesz Representers.” arXiv:1809.05224 [Econ, Math, Stat], September.
Chetverikov, Denis, Zhipeng Liao, and Victor Chernozhukov. 2016. β€œOn Cross-Validated Lasso.” arXiv:1605.02214 [Math, Stat], May.
Chichignoud, MichaΓ«l, Johannes Lederer, and Martin Wainwright. 2014. β€œA Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees.” arXiv:1410.0247 [Math, Stat], October.
Dai, Ran, and Rina Foygel Barber. 2016. β€œThe Knockoff Filter for FDR Control in Group-Sparse and Multitask Regression.” arXiv Preprint arXiv:1602.03589.
Daneshmand, Hadi, Manuel Gomez-Rodriguez, Le Song, and Bernhard SchΓΆlkopf. 2014. β€œEstimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-Thresholding Algorithm.” In ICML.
Descloux, Pascaline, and Sylvain Sardy. 2018. β€œModel Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles.” arXiv:1805.05133 [Stat], May.
Diaconis, Persi, and David Freedman. 1984. β€œAsymptotics of Graphical Projection Pursuit.” The Annals of Statistics 12 (3): 793–815.
Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. β€œLeast Angle Regression.” The Annals of Statistics 32 (2): 407–99.
Elhamifar, E., and R. Vidal. 2013. β€œSparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11): 2765–81.
Engebretsen, Solveig, and Jon Bohlin. 2019. β€œStatistical Predictions with Glmnet.” Clinical Epigenetics 11 (1): 123.
Ewald, Karl, and Ulrike Schneider. 2015. β€œConfidence Sets Based on the Lasso Estimator.” arXiv:1507.05315 [Math, Stat], July.
Fan, Jianqing, and Runze Li. 2001. β€œVariable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60.
Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. β€œLIBLINEAR: A Library for Large Linear Classification.” Journal of Machine Learning Research 9: 1871–74.
Flynn, Cheryl J., Clifford M. Hurvich, and Jeffrey S. Simonoff. 2013. β€œEfficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.” arXiv:1302.2068 [Stat], February.
Foygel, Rina, and Nathan Srebro. 2011. β€œFast-Rate and Optimistic-Rate Error Bounds for L1-Regularized Regression.” arXiv:1108.0373 [Math, Stat], August.
Friedman, Jerome, Trevor Hastie, Holger HΓΆfling, and Robert Tibshirani. 2007. β€œPathwise Coordinate Optimization.” The Annals of Applied Statistics 1 (2): 302–32.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. β€œSparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.
Fu, Fei, and Qing Zhou. 2013. β€œLearning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association 108 (501): 288–300.
Gasso, G., A. Rakotomamonjy, and S. Canu. 2009. β€œRecovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing 57 (12): 4686–98.
Geer, Sara A. van de. 2008. β€œHigh-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics 36 (2): 614–45.
Geer, Sara A. van de, Peter BΓΌhlmann, and Shuheng Zhou. 2011. β€œThe Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics 5: 688–749.
Geer, Sara van de. 2007. β€œThe Deterministic Lasso.”
β€”β€”β€”. 2014a. β€œWeakly Decomposable Regularization Penalties and Structured Sparsity.” Scandinavian Journal of Statistics 41 (1): 72–86.
β€”β€”β€”. 2014b. β€œWorst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat]. Vol. 131.
β€”β€”β€”. 2014c. β€œStatistical Theory for High-Dimensional Models.” arXiv:1409.8557 [Math, Stat], September.
β€”β€”β€”. 2016. Estimation and Testing Under Sparsity. Vol. 2159. Lecture Notes in Mathematics. Cham: Springer International Publishing.
Geer, Sara van de, Peter BΓΌhlmann, Ya’acov Ritov, and Ruben Dezeure. 2014. β€œOn Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models.” The Annals of Statistics 42 (3): 1166–1202.
Ghadimi, Saeed, and Guanghui Lan. 2013a. β€œStochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.” SIAM Journal on Optimization 23 (4): 2341–68.
β€”β€”β€”. 2013b. β€œAccelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.” arXiv:1310.3787 [Math], October.
Girolami, Mark. 2001. β€œA Variational Method for Learning Sparse and Overcomplete Representations.” Neural Computation 13 (11): 2517–32.
Giryes, Raja, Guillermo Sapiro, and Alex M. Bronstein. 2014. β€œOn the Stability of Deep Networks.” arXiv:1412.5896 [Cs, Math, Stat], December.
Greenhill, Catherine, Mikhail Isaev, Matthew Kwan, and Brendan D. McKay. 2016. β€œThe Average Number of Spanning Trees in Sparse Graphs with Given Degrees.” arXiv:1606.01586 [Math], June.
Gu, Jiaying, Fei Fu, and Qing Zhou. 2014. β€œAdaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.” arXiv:1403.2310 [Stat], March.
Gui, Jiang, and Hongzhe Li. 2005. β€œPenalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Applications to Microarray Gene Expression Data.” Bioinformatics 21 (13): 3001–8.
Gupta, Pawan, and Marianna Pensky. 2016. β€œSolution of Linear Ill-Posed Problems Using Random Dictionaries.” arXiv:1605.07913 [Math, Stat], May.
Hallac, David, Jure Leskovec, and Stephen Boyd. 2015. β€œNetwork Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat], July.
Hansen, Niels Richard, Patricia Reynaud-Bouret, and Vincent Rivoirard. 2015. β€œLasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli 21 (1): 83–143.
Hastie, Trevor J., Tibshirani, Rob, and Martin J. Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton: Chapman and Hall/CRC.
Hawe, S., M. Kleinsteuber, and K. Diepold. 2013. β€œAnalysis Operator Learning and Its Application to Image Reconstruction.” IEEE Transactions on Image Processing 22 (6): 2138–50.
He, Dan, Irina Rish, and Laxmi Parida. 2014. β€œTransductive HSIC Lasso.” In Proceedings of the 2014 SIAM International Conference on Data Mining, edited by Mohammed Zaki, Zoran Obradovic, Pang Ning Tan, Arindam Banerjee, Chandrika Kamath, and Srinivasan Parthasarathy, 154–62. Proceedings. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Hebiri, Mohamed, and Sara A. van de Geer. 2011. β€œThe Smooth-Lasso and Other β„“1+β„“2-Penalized Methods.” Electronic Journal of Statistics 5: 1184–1226.
Hegde, Chinmay, and Richard G. Baraniuk. 2012. β€œSignal Recovery on Incoherent Manifolds.” IEEE Transactions on Information Theory 58 (12): 7204–14.
Hegde, Chinmay, Piotr Indyk, and Ludwig Schmidt. 2015. β€œA Nearly-Linear Time Framework for Graph-Structured Sparsity.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 928–37.
Hesterberg, Tim, Nam Hee Choi, Lukas Meier, and Chris Fraley. 2008. β€œLeast Angle and β„“1 Penalized Regression: A Review.” Statistics Surveys 2: 61–93.
Hormati, A., O. Roy, Y.M. Lu, and M. Vetterli. 2010. β€œDistributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications.” IEEE Transactions on Signal Processing 58 (3): 1095–1109.
Hsieh, Cho-Jui, MΓ‘tyΓ‘s A. Sustik, Inderjit S. Dhillon, and Pradeep D. Ravikumar. 2014. β€œQUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.” Journal of Machine Learning Research 15 (1): 2911–47.
Hu, Tao, Cengiz Pehlevan, and Dmitri B. Chklovskii. 2014. β€œA Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.
Huang, Cong, G. L. H. Cheang, and Andrew R. Barron. 2008. β€œRisk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”
Ishwaran, Hemant, and J. Sunil Rao. 2005. β€œSpike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics 33 (2): 730–73.
Janson, Lucas, William Fithian, and Trevor J. Hastie. 2015. β€œEffective Degrees of Freedom: A Flawed Metaphor.” Biometrika 102 (2): 479–85.
Javanmard, Adel, and Andrea Montanari. 2014. β€œConfidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research 15 (1): 2869–909.
Jung, Alexander. 2013. β€œAn RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29.
KabΓ‘n, Ata. 2014. β€œNew Bounds on Compressive Linear Least Squares Regression.” In Journal of Machine Learning Research, 448–56.
Koppel, Alec, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro. 2016. β€œParsimonious Online Learning with Kernels via Sparse Projections in Function Space.” arXiv:1612.04111 [Cs, Stat], December.
Kowalski, Matthieu, and Bruno TorrΓ©sani. 2009. β€œStructured Sparsity: From Mixed Norms to Structured Shrinkage.” In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations.
KrΓ€mer, Nicole, Juliane SchΓ€fer, and Anne-Laure Boulesteix. 2009. β€œRegularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics 10 (1): 384.
Lam, Clifford, and Jianqing Fan. 2009. β€œSparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics 37 (6B): 4254–78.
Lambert-Lacroix, Sophie, and Laurent Zwald. 2011. β€œRobust Regression Through the Huber’s Criterion and Adaptive Lasso Penalty.” Electronic Journal of Statistics 5: 1015–53.
Langford, John, Lihong Li, and Tong Zhang. 2009. β€œSparse Online Learning via Truncated Gradient.” In Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 905–12. Curran Associates, Inc.
Lederer, Johannes, and Michael Vogt. 2020. β€œEstimating the Lasso’s Effective Noise.” arXiv:2004.11554 [Stat], April.
Lee, Jason D., Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor. 2013. β€œExact Post-Selection Inference, with Application to the Lasso.” arXiv:1311.6238 [Math, Stat], November.
Lemhadri, Ismael, Feng Ruan, Louis Abraham, and Robert Tibshirani. 2021. β€œLassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research 22 (127): 1–29.
Lim, NΓ©hΓ©my, and Johannes Lederer. 2016. β€œEfficient Feature Selection With Large and High-Dimensional Data.” arXiv:1609.07195 [Stat], September.
Lockhart, Richard, Jonathan Taylor, Ryan J. Tibshirani, and Robert Tibshirani. 2014. β€œA Significance Test for the Lasso.” The Annals of Statistics 42 (2): 413–68.
Lu, W., Y. Goldberg, and J. P. Fine. 2012. β€œOn the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika 99 (3): 717–31.
Mahoney, Michael W. 2016. β€œLecture Notes on Spectral Graph Methods.” arXiv Preprint arXiv:1608.04845.
Mairal, J. 2015. β€œIncremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning.” SIAM Journal on Optimization 25 (2): 829–55.
Mazumder, Rahul, Jerome H Friedman, and Trevor J. Hastie. 2009. β€œSparseNet: Coordinate Descent with Non-Convex Penalties.” Stanford University.
Meier, Lukas, Sara van de Geer, and Peter BΓΌhlmann. 2008. β€œThe Group Lasso for Logistic Regression.” Group 70 (Part 1): 53–71.
Meinshausen, Nicolai, and Peter BΓΌhlmann. 2006. β€œHigh-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics 34 (3): 1436–62.
Meinshausen, Nicolai, and Bin Yu. 2009. β€œLasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics 37 (1): 246–70.
Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. β€œVariational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML.
Montanari, Andrea. 2012. β€œGraphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications, 394–438.
Mousavi, Ali, and Richard G. Baraniuk. 2017. β€œLearning to Invert: Signal Recovery via Deep Convolutional Networks.” In ICASSP.
MΓΌller, Patric, and Sara van de Geer. 2015. β€œCensored Linear Model in High Dimensions: Penalised Linear Regression on High-Dimensional Data with Left-Censored Response Variable.” TEST, April.
Nam, Sangnam, and R. Gribonval. 2012. β€œPhysics-Driven Structured Cosparse Modeling for Source Localization.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5397–5400.
Needell, D., and J. A. Tropp. 2008. β€œCoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples.” arXiv:0803.2392 [Cs, Math], March.
Nesterov, Yu. 2012. β€œGradient Methods for Minimizing Composite Functions.” Mathematical Programming 140 (1): 125–61.
Neville, Sarah E., John T. Ormerod, and M. P. Wand. 2014. β€œMean Field Variational Bayes for Continuous Sparse Signal Shrinkage: Pitfalls and Remedies.” Electronic Journal of Statistics 8 (1): 1113–51.
Ngiam, Jiquan, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, and Andrew Y. Ng. 2011. β€œSparse Filtering.” In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 1125–33. Curran Associates, Inc.
Nickl, Richard, and Sara van de Geer. 2013. β€œConfidence Sets in Sparse Regression.” The Annals of Statistics 41 (6): 2852–76.
Oymak, S., A. Jalali, M. Fazel, and B. Hassibi. 2013. β€œNoisy Estimation of Simultaneously Structured Models: Limitations of Convex Relaxation.” In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC), 6019–24.
Peleg, Tomer, Yonina C. Eldar, and Michael Elad. 2010. β€œExploiting Statistical Dependencies in Sparse Representations for Signal Recovery.” IEEE Transactions on Signal Processing 60 (5): 2286–2303.
Portnoy, Stephen, and Roger Koenker. 1997. β€œThe Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300.
Pouget-Abadie, Jean, and Thibaut Horel. 2015. β€œInferring Graphs from Cascades: A Sparse Recovery Framework.” In Proceedings of The 32nd International Conference on Machine Learning.
Pourahmadi, Mohsen. 2011. β€œCovariance Estimation: The GLM and Regularization Perspectives.” Statistical Science 26 (3): 369–87.
Qian, Wei, and Yuhong Yang. 2012. β€œModel Selection via Standard Error Adjusted Adaptive Lasso.” Annals of the Institute of Statistical Mathematics 65 (2): 295–318.
Qin, Zhiwei, Katya Scheinberg, and Donald Goldfarb. 2013. β€œEfficient Block-Coordinate Descent Algorithms for the Group Lasso.” Mathematical Programming Computation 5 (2): 143–69.
Rahimi, Ali, and Benjamin Recht. 2009. β€œWeighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems, 1313–20. Curran Associates, Inc.
Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. 2011. β€œHigh-Dimensional Covariance Estimation by Minimizing β„“1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics 5: 935–80.
Ravishankar, Saiprasad, and Yoram Bresler. 2015. β€œEfficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI.” arXiv:1501.02923 [Cs, Stat], January.
Ravishankar, S., and Y. Bresler. 2015. β€œSparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees.” IEEE Transactions on Signal Processing 63 (9): 2389–2404.
Reynaud-Bouret, Patricia. 2003. β€œAdaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields 126 (1).
Reynaud-Bouret, Patricia, and Sophie Schbath. 2010. β€œAdaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics 38 (5): 2781–2822.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. β€œβ€˜Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. KDD ’16. New York, NY, USA: ACM.
Rish, Irina, and Genady Grabarnik. 2014. β€œSparse Signal Recovery with Exponential-Family Noise.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 77–93. Signals and Communication Technology. Springer Berlin Heidelberg.
Rish, Irina, and Genady Ya Grabarnik. 2015. Sparse Modeling: Theory, Algorithms, and Applications. Chapman & Hall/CRC Machine Learning & Pattern Recognition Series. Boca Raton, FL: CRC Press, Taylor & Francis Group.
RočkovΓ‘, Veronika, and Edward I. George. 2018. β€œThe Spike-and-Slab LASSO.” Journal of the American Statistical Association 113 (521): 431–44.
Sashank J. Reddi, Suvrit Sra, BarnabΓ‘s PΓ³czΓ³s, and Alex Smola. 1995. β€œStochastic Frank-Wolfe Methods for Nonconvex Optimization.”
Schelldorfer, JΓΌrg, Peter BΓΌhlmann, and Sara Van De Geer. 2011. β€œEstimation for High-Dimensional Linear Mixed-Effects Models Using β„“1-Penalization.” Scandinavian Journal of Statistics 38 (2): 197–214.
She, Yiyuan, and Art B. Owen. 2010. β€œOutlier Detection Using Nonconvex Penalized Regression.”
Simon, Noah, Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2011. β€œRegularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software 39 (5).
Smith, Virginia, Simone Forte, Michael I. Jordan, and Martin Jaggi. 2015. β€œL1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.” arXiv:1512.04011 [Cs], December.
Soh, Yong Sheng, and Venkat Chandrasekaran. 2017. β€œA Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.” arXiv:1701.01207 [Cs, Math, Stat], January.
Soltani, Mohammadreza, and Chinmay Hegde. 2016. β€œDemixing Sparse Signals from Nonlinear Observations.” Statistics 7: 9.
Starck, J. L., Michael Elad, and David L. Donoho. 2005. β€œImage Decomposition via the Combination of Sparse Representations and a Variational Approach.” IEEE Transactions on Image Processing 14 (10): 1570–82.
Stine, Robert A. 2004. β€œDiscussion of β€˜Least Angle Regression’ by Efron Et Al.” The Annals of Statistics 32 (2): 407–99.
Su, Weijie, Malgorzata Bogdan, and Emmanuel J. CandΓ¨s. 2015. β€œFalse Discoveries Occur Early on the Lasso Path.” arXiv:1511.01957 [Cs, Math, Stat], November.
Taddy, Matt. 2013. β€œOne-Step Estimator Paths for Concave Regularization.” arXiv:1308.5623 [Stat], August.
Thisted, Ronald A. 1997. β€œ[The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators]: Comment.” Statistical Science 12 (4): 296–98.
Thrampoulidis, Chrtistos, Ehsan Abbasi, and Babak Hassibi. 2015. β€œLASSO with Non-Linear Measurements Is Equivalent to One With Linear Measurements.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, 3402–10. Curran Associates, Inc.
Tibshirani, Robert. 1996. β€œRegression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58 (1): 267–88.
β€”β€”β€”. 2011. β€œRegression Shrinkage and Selection via the Lasso: A Retrospective.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (3): 273–82.
Tibshirani, Ryan J. 2014. β€œA General Framework for Fast Stagewise Algorithms.” arXiv:1408.5801 [Stat], August.
Trofimov, Ilya, and Alexander Genkin. 2015. β€œDistributed Coordinate Descent for L1-Regularized Logistic Regression.” In Analysis of Images, Social Networks and Texts, edited by Mikhail Yu Khachay, Natalia Konstantinova, Alexander Panchenko, Dmitry I. Ignatov, and Valeri G. Labunets, 243–54. Communications in Computer and Information Science 542. Springer International Publishing.
β€”β€”β€”. 2016. β€œDistributed Coordinate Descent for Generalized Linear Models with Regularization.” arXiv:1611.02101 [Cs, Stat], November.
Tropp, J. A., and S. J. Wright. 2010. β€œComputational Methods for Sparse Solution of Linear Inverse Problems.” Proceedings of the IEEE 98 (6): 948–58.
Tschannen, Michael, and Helmut BΓΆlcskei. 2016. β€œNoisy Subspace Clustering via Matching Pursuits.” arXiv:1612.03450 [Cs, Math, Stat], December.
Uematsu, Yoshimasa. 2015. β€œPenalized Likelihood Estimation in High-Dimensional Time Series Models and Its Application.” arXiv:1504.06706 [Math, Stat], April.
Unser, Michael A., and Pouya Tafti. 2014. An Introduction to Sparse Stochastic Processes. New York: Cambridge University Press.
Unser, M., P. D. Tafti, A. Amini, and H. Kirshner. 2014. β€œA Unified Formulation of Gaussian Vs Sparse Stochastic Processes - Part II: Discrete-Domain Theory.” IEEE Transactions on Information Theory 60 (5): 3036–51.
Unser, M., P. D. Tafti, and Q. Sun. 2014. β€œA Unified Formulation of Gaussian Vs Sparse Stochastic Processesβ€”Part I: Continuous-Domain Theory.” IEEE Transactions on Information Theory 60 (3): 1945–62.
Veitch, Victor, and Daniel M. Roy. 2015. β€œThe Class of Random Graphs Arising from Exchangeable Random Measures.” arXiv:1512.03099 [Cs, Math, Stat], December.
Wahba, Grace. 1990. Spline Models for Observational Data. SIAM.
Wang, Hansheng, Guodong Li, and Guohua Jiang. 2007. β€œRobust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics 25 (3): 347–55.
Wang, L., M. D. Gordon, and J. Zhu. 2006. β€œRegularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning.” In Sixth International Conference on Data Mining (ICDM’06), 690–700.
Wang, Zhangyang, Shiyu Chang, Qing Ling, Shuai Huang, Xia Hu, Honghui Shi, and Thomas S. Huang. 2016. β€œStacked Approximated Regression Machine: A Simple Deep Learning Approach.” In.
Wisdom, Scott, Thomas Powers, James Pitton, and Les Atlas. 2016. β€œInterpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29.
Woodworth, Joseph, and Rick Chartrand. 2015. β€œCompressed Sensing Recovery via Nonconvex Shrinkage Penalties.” arXiv:1504.02923 [Cs, Math], April.
Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. 2009. β€œSparse Reconstruction by Separable Approximation.” IEEE Transactions on Signal Processing 57 (7): 2479–93.
Wu, Tong Tong, and Kenneth Lange. 2008. β€œCoordinate Descent Algorithms for Lasso Penalized Regression.” The Annals of Applied Statistics 2 (1): 224–44.
Xu, H., C. Caramanis, and S. Mannor. 2010. β€œRobust Regression and Lasso.” IEEE Transactions on Information Theory 56 (7): 3561–74.
β€”β€”β€”. 2012. β€œSparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (1): 187–93.
Yaghoobi, M., Sangnam Nam, R. Gribonval, and M.E. Davies. 2012. β€œNoise Aware Analysis Operator Learning for Approximately Cosparse Signals.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5409–12.
Yang, Wenzhuo, and Huan Xu. 2013. β€œA Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3), 585–93.
Yoshida, Ryo, and Mike West. 2010. β€œBayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research 11 (May): 1771–98.
Yuan, Ming, and Yi Lin. 2006. β€œModel Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1): 49–67.
β€”β€”β€”. 2007. β€œModel Selection and Estimation in the Gaussian Graphical Model.” Biometrika 94 (1): 19–35.
Yun, Sangwoon, and Kim-Chuan Toh. 2009. β€œA Coordinate Gradient Descent Method for β„“ 1-Regularized Convex Minimization.” Computational Optimization and Applications 48 (2): 273–307.
Zhang, Cun-Hui. 2010. β€œNearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics 38 (2): 894–942.
Zhang, Cun-Hui, and Stephanie S. Zhang. 2014. β€œConfidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (1): 217–42.
Zhang, Lijun, Tianbao Yang, Rong Jin, and Zhi-Hua Zhou. 2015. β€œSparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach.” arXiv:1511.03766 [Cs], November.
Zhao, Peng, Guilherme Rocha, and Bin Yu. 2009. β€œThe Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics 37 (6A): 3468–97.
Zhao, Tuo, Han Liu, and Tong Zhang. 2018. β€œPathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory.” The Annals of Statistics 46 (1): 180–218.
Zhou, Tianyi, Dacheng Tao, and Xindong Wu. 2011. β€œManifold Elastic Net: A Unified Framework for Sparse Dimension Reduction.” Data Mining and Knowledge Discovery 22 (3): 340–71.
Zou, Hui. 2006. β€œThe Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101 (476): 1418–29.
Zou, Hui, and Trevor Hastie. 2005. β€œRegularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20.
Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2007. β€œOn the β€˜Degrees of Freedom’ of the Lasso.” The Annals of Statistics 35 (5): 2173–92.
Zou, Hui, and Runze Li. 2008. β€œOne-Step Sparse Estimates in Nonconcave Penalized Likelihood Models.” The Annals of Statistics 36 (4): 1509–33.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.