Sparse regression

June 23, 2016 β€” October 24, 2019

Penalised regression where the penalties are sparsifying. The prediction losses could be anything β€” likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have to choose clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and measurement error.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

πŸ— disambiguate the optimisation technologies at play β€” iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.

1 LASSO

Quadratic loss penalty, absolute coefficient penalty. We estimate the regression coefficients \(\beta\) by solving

\[\begin{aligned} \hat{\beta} = \underset{\beta \in \mathbb{R}^p}{\text{argmin}} \: \frac{1}{2} \| y - {\bf X} \beta \|_2^2 + \lambda \| \beta \|_1, \end{aligned}\]

The penalty coefficient \(\lambda\) is left for you to choose, but one of the magical properties of the lasso is that it is easy to test many possible values of \(\lambda\) at low marginal cost.

Popular because, amongst other reasons, it turns out to be in practice fast and convenient, and amenable to various performance accelerations e.g. aggressive approximate variable selection.

2 Adaptive LASSO

πŸ— This is the one with famous oracle properties if you choose \(\lambda\) correctly. Hsi Zou’s paper on this (Zou 2006) is readable. I am having trouble digesting Sara van de Geer’s paper (S. A. van de Geer 2008) on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, but with far more general assumptions on the model and loss functions, and some finite sample guarnatees.

3 LARS

A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.

4 Graph LASSO

As used in graphical models. πŸ—

5 Elastic net

Combination of \(L_1\) and \(L_2\) penalties. πŸ—

6 Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See (Yuan and Lin 2006).

7 Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

8 Debiased LASSO

There exist a few versions, but the one I have needed is (S. A. van de Geer 2008), section 2.1. See also and (S. van de Geer 2014b). (πŸ— relation to (S. A. van de Geer 2008)?)

9 Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

10 Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing. (Wisdom et al. 2016)

11 Other coefficient penalties

Put a weird penalty on the coefficients! E.g. β€œSmoothly Clipped Absolute Deviation” (SCAD). πŸ—

12 Other prediction losses

Put a weird penalty on the error! MAD prediction penalty, lasso-coefficient penalty, etc.

See (H. Wang, Li, and Jiang 2007; Portnoy and Koenker 1997) for some implementations using e.g. maximum absolute prediction error.

13 Bayesian Lasso

See Bayesian sparsity.

14 Implementations

Hastie, Friedman eta’s glmnet for R is fast and well-regarded, and has a MATLAB version. Here’s how to use it for adaptive lasso. Kenneth Tay has implemented elasticnet penalty for any GLM in glmnet.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. It’s an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.

15 Tidbits

Sparse regression as a universal classifier explainer? Local Interpretable Model-agnostic Explanations (Ribeiro, Singh, and Guestrin 2016) uses LASSO for model interpretation this. (See the blog post, or the source.

16 References

Abramovich, Benjamini, Donoho, et al. 2006. β€œAdapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics.
Aghasi, Nguyen, and Romberg. 2016. β€œNet-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks.” arXiv:1611.05162 [Cs, Stat].
Aragam, Amini, and Zhou. 2015. β€œLearning Directed Acyclic Graphs with Penalized Neighbourhood Regression.” arXiv:1511.08963 [Cs, Math, Stat].
Azadkia, and Chatterjee. 2019. β€œA Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat].
Azizyan, Krishnamurthy, and Singh. 2015. β€œExtreme Compressive Sampling for Covariance Estimation.” arXiv:1506.00898 [Cs, Math, Stat].
Bach. 2009. β€œModel-Consistent Sparse Estimation Through the Bootstrap.” arXiv:0901.3202 [Cs, Stat].
Bach, Jenatton, and Mairal. 2011. Optimization With Sparsity-Inducing Penalties. Foundations and Trends(r) in Machine Learning 1.0.
Bahmani, and Romberg. 2014. β€œLifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation.” arXiv:1501.00046 [Cs, Math, Stat].
Banerjee, Arindam, Chen, Fazayeli, et al. 2014. β€œEstimation with Norm Regularization.” In Advances in Neural Information Processing Systems 27.
Banerjee, Onureena, Ghaoui, and d’Aspremont. 2008. β€œModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research.
Barber, and CandΓ¨s. 2015. β€œControlling the False Discovery Rate via Knockoffs.” The Annals of Statistics.
Barbier. 2015. β€œStatistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory.” arXiv:1511.01650 [Cs, Math].
Baron, Sarvotham, and Baraniuk. 2010. β€œBayesian Compressive Sensing via Belief Propagation.” IEEE Transactions on Signal Processing.
Barron, Cohen, Dahmen, et al. 2008. β€œApproximation and Learning by Greedy Algorithms.” The Annals of Statistics.
Barron, Huang, Li, et al. 2008. β€œMDL, Penalized Likelihood, and Statistical Risk.” In Information Theory Workshop, 2008. ITW’08. IEEE.
Battiti. 1992. β€œFirst-and Second-Order Methods for Learning: Between Steepest Descent and Newton’s Method.” Neural Computation.
Bayati, and Montanari. 2012. β€œThe LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory.
Bellec, and Tsybakov. 2016. β€œBounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty.” arXiv:1609.06675 [Math, Stat].
Belloni, Chernozhukov, and Wang. 2011. β€œSquare-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.” Biometrika.
Berk, Brown, Buja, et al. 2013. β€œValid Post-Selection Inference.” The Annals of Statistics.
Bertin, Pennec, and Rivoirard. 2011. β€œAdaptive Dantzig Density Estimation.” Annales de l’Institut Henri PoincarΓ©, ProbabilitΓ©s Et Statistiques.
Bian, Chen, and Ye. 2014. β€œComplexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization.” Mathematical Programming.
Bien, Gaynanova, Lederer, et al. 2018. β€œNon-Convex Global Minimization and False Discovery Rate Control for the TREX.” Journal of Computational and Graphical Statistics.
Bloniarz, Liu, Zhang, et al. 2015. β€œLasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” arXiv:1507.03652 [Math, Stat].
Bondell, Krishna, and Ghosh. 2010. β€œJoint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics.
Borgs, Chayes, Cohn, et al. 2014. β€œAn \(L^p\) Theory of Sparse Graph Convergence I: Limits, Sparse Random Graph Models, and Power Law Distributions.” arXiv:1401.2906 [Math].
Bottou, Curtis, and Nocedal. 2016. β€œOptimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838 [Cs, Math, Stat].
Breiman. 1995. β€œBetter Subset Regression Using the Nonnegative Garrote.” Technometrics.
Bruckstein, Elad, and Zibulevsky. 2008. β€œOn the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.” IEEE Transactions on Information Theory.
Brunton, Proctor, and Kutz. 2016. β€œDiscovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
BΓΌhlmann, and van de Geer. 2011. β€œAdditive Models and Many Smooth Univariate Functions.” In Statistics for High-Dimensional Data. Springer Series in Statistics.
β€”β€”β€”. 2015. β€œHigh-Dimensional Inference in Misspecified Linear Models.” arXiv:1503.06426 [Stat].
Bu, and Lederer. 2017. β€œIntegrating Additional Knowledge Into Estimation of Graphical Models.” arXiv:1704.02739 [Stat].
Bunea, Tsybakov, and Wegkamp. 2007a. β€œSparsity Oracle Inequalities for the Lasso.” Electronic Journal of Statistics.
Bunea, Tsybakov, and Wegkamp. 2007b. β€œSparse Density Estimation with β„“1 Penalties.” In Learning Theory. Lecture Notes in Computer Science.
CandΓ¨s, and Davenport. 2011. β€œHow Well Can We Estimate a Sparse Vector?” arXiv:1104.5246 [Cs, Math, Stat].
CandΓ¨s, Fan, Janson, et al. 2016. β€œPanning for Gold: Model-Free Knockoffs for High-Dimensional Controlled Variable Selection.” arXiv Preprint arXiv:1610.02351.
CandΓ¨s, and Fernandez-Granda. 2013. β€œSuper-Resolution from Noisy Data.” Journal of Fourier Analysis and Applications.
CandΓ¨s, and Plan. 2010. β€œMatrix Completion With Noise.” Proceedings of the IEEE.
CandΓ¨s, Romberg, and Tao. 2006. β€œStable Signal Recovery from Incomplete and Inaccurate Measurements.” Communications on Pure and Applied Mathematics.
CandΓ¨s, Wakin, and Boyd. 2008. β€œEnhancing Sparsity by Reweighted β„“ 1 Minimization.” Journal of Fourier Analysis and Applications.
Carmi. 2013. β€œCompressive System Identification: Sequential Methods and Entropy Bounds.” Digital Signal Processing.
β€”β€”β€”. 2014. β€œCompressive System Identification.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Cevher, Duarte, Hegde, et al. 2009. β€œSparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems.
Chartrand, and Yin. 2008. β€œIteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008.
Chatterjee. 2020. β€œA New Coefficient of Correlation.” arXiv:1909.10140 [Math, Stat].
Chen, Xiaojun. 2012. β€œSmoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming.
Chen, Y., and Hero. 2012. β€œRecursive β„“1,∞ Group Lasso.” IEEE Transactions on Signal Processing.
Chen, Minhua, Silva, Paisley, et al. 2010. β€œCompressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.” IEEE Transactions on Signal Processing.
Chen, Yen-Chi, and Wang. n.d. β€œDiscussion on β€˜Confidence Intervals and Hypothesis Testing for High-Dimensional Regression’.”
Chernozhukov, Chetverikov, Demirer, et al. 2016. β€œDouble/Debiased Machine Learning for Treatment and Causal Parameters.” arXiv:1608.00060 [Econ, Stat].
Chernozhukov, Hansen, Liao, et al. 2018. β€œInference For Heterogeneous Effects Using Low-Rank Estimations.” arXiv:1812.08089 [Math, Stat].
Chernozhukov, Newey, and Singh. 2018. β€œLearning L2 Continuous Regression Functionals via Regularized Riesz Representers.” arXiv:1809.05224 [Econ, Math, Stat].
Chetverikov, Liao, and Chernozhukov. 2016. β€œOn Cross-Validated Lasso.” arXiv:1605.02214 [Math, Stat].
Chichignoud, Lederer, and Wainwright. 2014. β€œA Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees.” arXiv:1410.0247 [Math, Stat].
Dai, and Barber. 2016. β€œThe Knockoff Filter for FDR Control in Group-Sparse and Multitask Regression.” arXiv Preprint arXiv:1602.03589.
Daneshmand, Gomez-Rodriguez, Song, et al. 2014. β€œEstimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-Thresholding Algorithm.” In ICML.
Descloux, and Sardy. 2018. β€œModel Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles.” arXiv:1805.05133 [Stat].
Diaconis, and Freedman. 1984. β€œAsymptotics of Graphical Projection Pursuit.” The Annals of Statistics.
Dossal, Kachour, Fadili, et al. 2011. β€œThe Degrees of Freedom of the Lasso for General Design Matrix.” arXiv:1111.1162 [Cs, Math, Stat].
Efron, Hastie, Johnstone, et al. 2004. β€œLeast Angle Regression.” The Annals of Statistics.
El Karoui. 2008. β€œOperator Norm Consistent Estimation of Large Dimensional Sparse Covariance Matrices.” University of California, Berkeley.
Elhamifar, and Vidal. 2013. β€œSparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Engebretsen, and Bohlin. 2019. β€œStatistical Predictions with Glmnet.” Clinical Epigenetics.
Ewald, and Schneider. 2015. β€œConfidence Sets Based on the Lasso Estimator.” arXiv:1507.05315 [Math, Stat].
Fan, Rong-En, Chang, Hsieh, et al. 2008. β€œLIBLINEAR: A Library for Large Linear Classification.” Journal of Machine Learning Research.
Fan, Jianqing, and Li. 2001. β€œVariable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association.
Fan, Jianqing, and Lv. 2010. β€œA Selective Overview of Variable Selection in High Dimensional Feature Space.” Statistica Sinica.
Flynn, Hurvich, and Simonoff. 2013. β€œEfficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.” arXiv:1302.2068 [Stat].
Foygel, and Srebro. 2011. β€œFast-Rate and Optimistic-Rate Error Bounds for L1-Regularized Regression.” arXiv:1108.0373 [Math, Stat].
Friedman, Hastie, HΓΆfling, et al. 2007. β€œPathwise Coordinate Optimization.” The Annals of Applied Statistics.
Friedman, Hastie, and Tibshirani. 2008. β€œSparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics.
Fu, and Zhou. 2013. β€œLearning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association.
Gasso, Rakotomamonjy, and Canu. 2009. β€œRecovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing.
Ghadimi, and Lan. 2013a. β€œStochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.” SIAM Journal on Optimization.
β€”β€”β€”. 2013b. β€œAccelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.” arXiv:1310.3787 [Math].
Girolami. 2001. β€œA Variational Method for Learning Sparse and Overcomplete Representations.” Neural Computation.
Giryes, Sapiro, and Bronstein. 2014. β€œOn the Stability of Deep Networks.” arXiv:1412.5896 [Cs, Math, Stat].
Greenhill, Isaev, Kwan, et al. 2016. β€œThe Average Number of Spanning Trees in Sparse Graphs with Given Degrees.” arXiv:1606.01586 [Math].
Gu, Fu, and Zhou. 2014. β€œAdaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.” arXiv:1403.2310 [Stat].
Gui, and Li. 2005. β€œPenalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Applications to Microarray Gene Expression Data.” Bioinformatics.
Gupta, and Pensky. 2016. β€œSolution of Linear Ill-Posed Problems Using Random Dictionaries.” arXiv:1605.07913 [Math, Stat].
Hallac, Leskovec, and Boyd. 2015. β€œNetwork Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat].
Hall, Jin, and Miller. 2014. β€œFeature Selection When There Are Many Influential Features.” Bernoulli.
Hall, and Xue. 2014. β€œOn Selecting Interacting Features from High-Dimensional Data.” Computational Statistics & Data Analysis.
Hansen, Reynaud-Bouret, and Rivoirard. 2015. β€œLasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli.
Hastie, Tibshirani, Rob, and Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations.
Hawe, Kleinsteuber, and Diepold. 2013. β€œAnalysis Operator Learning and Its Application to Image Reconstruction.” IEEE Transactions on Image Processing.
Hebiri, and van de Geer. 2011. β€œThe Smooth-Lasso and Other β„“1+β„“2-Penalized Methods.” Electronic Journal of Statistics.
Hegde, and Baraniuk. 2012. β€œSignal Recovery on Incoherent Manifolds.” IEEE Transactions on Information Theory.
Hegde, Indyk, and Schmidt. 2015. β€œA Nearly-Linear Time Framework for Graph-Structured Sparsity.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15).
He, Rish, and Parida. 2014. β€œTransductive HSIC Lasso.” In Proceedings of the 2014 SIAM International Conference on Data Mining. Proceedings.
Hesterberg, Choi, Meier, et al. 2008. β€œLeast Angle and β„“1 Penalized Regression: A Review.” Statistics Surveys.
Hirose, Tateishi, and Konishi. 2011. β€œEfficient Algorithm to Select Tuning Parameters in Sparse Regression Modeling with Regularization.” arXiv:1109.2411 [Stat].
Hormati, Roy, Lu, et al. 2010. β€œDistributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications.” IEEE Transactions on Signal Processing.
Hsieh, Sustik, Dhillon, et al. 2014. β€œQUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.” Journal of Machine Learning Research.
Huang, Cheang, and Barron. 2008. β€œRisk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”
Hu, Pehlevan, and Chklovskii. 2014. β€œA Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.
Ishwaran, and Rao. 2005. β€œSpike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics.
JankovΓ‘, and van de Geer. 2016. β€œConfidence Regions for High-Dimensional Generalized Linear Models Under Sparsity.” arXiv:1610.01353 [Math, Stat].
Janson, Fithian, and Hastie. 2015. β€œEffective Degrees of Freedom: A Flawed Metaphor.” Biometrika.
Javanmard, and Montanari. 2014. β€œConfidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research.
Jung. 2013. β€œAn RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29.
KabΓ‘n. 2014. β€œNew Bounds on Compressive Linear Least Squares Regression.” In Journal of Machine Learning Research.
Kato. 2009. β€œOn the Degrees of Freedom in Shrinkage Estimation.” Journal of Multivariate Analysis.
Kim, Kwon, and Choi. 2012. β€œConsistent Model Selection Criteria on High Dimensions.” Journal of Machine Learning Research.
Koltchinskii. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics Γ‰cole d’ÉtΓ© de ProbabilitΓ©s de Saint-Flour 2033.
Koppel, Warnell, Stump, et al. 2016. β€œParsimonious Online Learning with Kernels via Sparse Projections in Function Space.” arXiv:1612.04111 [Cs, Stat].
Kowalski, and TorrΓ©sani. 2009. β€œStructured Sparsity: From Mixed Norms to Structured Shrinkage.” In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations.
KrΓ€mer, SchΓ€fer, and Boulesteix. 2009. β€œRegularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics.
Lambert-Lacroix, and Zwald. 2011. β€œRobust Regression Through the Huber’s Criterion and Adaptive Lasso Penalty.” Electronic Journal of Statistics.
Lam, and Fan. 2009. β€œSparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics.
Langford, Li, and Zhang. 2009. β€œSparse Online Learning via Truncated Gradient.” In Advances in Neural Information Processing Systems 21.
Lederer, and Vogt. 2020. β€œEstimating the Lasso’s Effective Noise.” arXiv:2004.11554 [Stat].
Lee, Sun, Sun, et al. 2013. β€œExact Post-Selection Inference, with Application to the Lasso.” arXiv:1311.6238 [Math, Stat].
Lemhadri, Ruan, Abraham, et al. 2021. β€œLassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research.
Li, and Lederer. 2019. β€œTuning Parameter Calibration for β„“1-Regularized Logistic Regression.” Journal of Statistical Planning and Inference.
Lim, and Lederer. 2016. β€œEfficient Feature Selection With Large and High-Dimensional Data.” arXiv:1609.07195 [Stat].
Lockhart, Taylor, Tibshirani, et al. 2014. β€œA Significance Test for the Lasso.” The Annals of Statistics.
Lu, Goldberg, and Fine. 2012. β€œOn the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika.
Lundberg, and Lee. 2017. β€œA Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems.
Mahoney. 2016. β€œLecture Notes on Spectral Graph Methods.” arXiv Preprint arXiv:1608.04845.
Mairal. 2015. β€œIncremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning.” SIAM Journal on Optimization.
Mazumder, Friedman, and Hastie. 2009. β€œSparseNet: Coordinate Descent with Non-Convex Penalties.”
Meier, van de Geer, and BΓΌhlmann. 2008. β€œThe Group Lasso for Logistic Regression.” Group.
Meinshausen, and BΓΌhlmann. 2006. β€œHigh-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics.
Meinshausen, and Yu. 2009. β€œLasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics.
Molchanov, Ashukha, and Vetrov. 2017. β€œVariational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML.
Montanari. 2012. β€œGraphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications.
Mousavi, and Baraniuk. 2017. β€œLearning to Invert: Signal Recovery via Deep Convolutional Networks.” In ICASSP.
MΓΌller, and van de Geer. 2015. β€œCensored Linear Model in High Dimensions: Penalised Linear Regression on High-Dimensional Data with Left-Censored Response Variable.” TEST.
Naik, and Tsai. 2001. β€œSingle‐index Model Selections.” Biometrika.
Nam, and Gribonval. 2012. β€œPhysics-Driven Structured Cosparse Modeling for Source Localization.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Needell, and Tropp. 2008. β€œCoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples.” arXiv:0803.2392 [Cs, Math].
Nesterov. 2012. β€œGradient Methods for Minimizing Composite Functions.” Mathematical Programming.
Neville, Ormerod, and Wand. 2014. β€œMean Field Variational Bayes for Continuous Sparse Signal Shrinkage: Pitfalls and Remedies.” Electronic Journal of Statistics.
Ngiam, Chen, Bhaskar, et al. 2011. β€œSparse Filtering.” In Advances in Neural Information Processing Systems 24.
Nickl, and van de Geer. 2013. β€œConfidence Sets in Sparse Regression.” The Annals of Statistics.
Oymak, Jalali, Fazel, et al. 2013. β€œNoisy Estimation of Simultaneously Structured Models: Limitations of Convex Relaxation.” In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC).
Peleg, Eldar, and Elad. 2010. β€œExploiting Statistical Dependencies in Sparse Representations for Signal Recovery.” IEEE Transactions on Signal Processing.
Portnoy, and Koenker. 1997. β€œThe Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science.
Pouget-Abadie, and Horel. 2015. β€œInferring Graphs from Cascades: A Sparse Recovery Framework.” In Proceedings of The 32nd International Conference on Machine Learning.
Pourahmadi. 2011. β€œCovariance Estimation: The GLM and Regularization Perspectives.” Statistical Science.
Qian, and Yang. 2012. β€œModel Selection via Standard Error Adjusted Adaptive Lasso.” Annals of the Institute of Statistical Mathematics.
Qin, Scheinberg, and Goldfarb. 2013. β€œEfficient Block-Coordinate Descent Algorithms for the Group Lasso.” Mathematical Programming Computation.
Rahimi, and Recht. 2009. β€œWeighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems.
Ravikumar, Wainwright, Raskutti, et al. 2011. β€œHigh-Dimensional Covariance Estimation by Minimizing β„“1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics.
Ravishankar, Saiprasad, and Bresler. 2015. β€œEfficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI.” arXiv:1501.02923 [Cs, Stat].
Ravishankar, S., and Bresler. 2015. β€œSparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees.” IEEE Transactions on Signal Processing.
Reynaud-Bouret. 2003. β€œAdaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields.
Reynaud-Bouret, and Schbath. 2010. β€œAdaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics.
Ribeiro, Singh, and Guestrin. 2016. β€œβ€˜Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16.
Rish, and Grabarnik. 2014. β€œSparse Signal Recovery with Exponential-Family Noise.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Rish, and Grabarnik. 2015. Sparse Modeling: Theory, Algorithms, and Applications. Chapman & Hall/CRC Machine Learning & Pattern Recognition Series.
RočkovΓ‘, and George. 2018. β€œThe Spike-and-Slab LASSO.” Journal of the American Statistical Association.
Sashank J. Reddi, Suvrit Sra, BarnabΓ‘s PΓ³czΓ³s, et al. 1995. β€œStochastic Frank-Wolfe Methods for Nonconvex Optimization.”
Schelldorfer, BΓΌhlmann, and van de Geer. 2011. β€œEstimation for High-Dimensional Linear Mixed-Effects Models Using β„“1-Penalization.” Scandinavian Journal of Statistics.
Semenova, Rudin, and Parr. 2021. β€œA Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning.” arXiv:1908.01755 [Cs, Stat].
Shen, and Huang. 2006. β€œOptimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association.
Shen, Huang, and Ye. 2004. β€œAdaptive Model Selection and Assessment for Exponential Family Distributions.” Technometrics.
Shen, and Ye. 2002. β€œAdaptive Model Selection.” Journal of the American Statistical Association.
She, and Owen. 2010. β€œOutlier Detection Using Nonconvex Penalized Regression.”
Simon, Friedman, Hastie, et al. 2011. β€œRegularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software.
Smith, Forte, Jordan, et al. 2015. β€œL1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.” arXiv:1512.04011 [Cs].
Soh, and Chandrasekaran. 2017. β€œA Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.” arXiv:1701.01207 [Cs, Math, Stat].
Soltani, and Hegde. 2016. β€œDemixing Sparse Signals from Nonlinear Observations.” Statistics.
Starck, Elad, and Donoho. 2005. β€œImage Decomposition via the Combination of Sparse Representations and a Variational Approach.” IEEE Transactions on Image Processing.
Stine. 2004. β€œDiscussion of β€˜Least Angle Regression’ by Efron Et Al.” The Annals of Statistics.
Su, Bogdan, and CandΓ¨s. 2015. β€œFalse Discoveries Occur Early on the Lasso Path.” arXiv:1511.01957 [Cs, Math, Stat].
Taddy. 2013. β€œOne-Step Estimator Paths for Concave Regularization.” arXiv:1308.5623 [Stat].
Tarr, MΓΌller, and Welsh. 2018. β€œMplot: An R Package for Graphical Model Stability and Variable Selection Procedures.” Journal of Statistical Software.
Thisted. 1997. β€œ[The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators]: Comment.” Statistical Science.
Thrampoulidis, Abbasi, and Hassibi. 2015. β€œLASSO with Non-Linear Measurements Is Equivalent to One With Linear Measurements.” In Advances in Neural Information Processing Systems 28.
Tibshirani, Robert. 1996. β€œRegression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological).
β€”β€”β€”. 2011. β€œRegression Shrinkage and Selection via the Lasso: A Retrospective.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Tibshirani, Ryan J. 2014. β€œA General Framework for Fast Stagewise Algorithms.” arXiv:1408.5801 [Stat].
Trofimov, and Genkin. 2015. β€œDistributed Coordinate Descent for L1-Regularized Logistic Regression.” In Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science 542.
β€”β€”β€”. 2016. β€œDistributed Coordinate Descent for Generalized Linear Models with Regularization.” arXiv:1611.02101 [Cs, Stat].
Tropp, and Wright. 2010. β€œComputational Methods for Sparse Solution of Linear Inverse Problems.” Proceedings of the IEEE.
Tschannen, and BΓΆlcskei. 2016. β€œNoisy Subspace Clustering via Matching Pursuits.” arXiv:1612.03450 [Cs, Math, Stat].
Uematsu. 2015. β€œPenalized Likelihood Estimation in High-Dimensional Time Series Models and Its Application.” arXiv:1504.06706 [Math, Stat].
Unser, Michael A., and Tafti. 2014. An Introduction to Sparse Stochastic Processes.
Unser, M., Tafti, Amini, et al. 2014. β€œA Unified Formulation of Gaussian Vs Sparse Stochastic Processes - Part II: Discrete-Domain Theory.” IEEE Transactions on Information Theory.
Unser, M., Tafti, and Sun. 2014. β€œA Unified Formulation of Gaussian Vs Sparse Stochastic Processesβ€”Part I: Continuous-Domain Theory.” IEEE Transactions on Information Theory.
Geer, Sara van de. 2007. β€œThe Deterministic Lasso.”
Geer, Sara A. van de. 2008. β€œHigh-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics.
Geer, Sara van de. 2014a. β€œWeakly Decomposable Regularization Penalties and Structured Sparsity.” Scandinavian Journal of Statistics.
β€”β€”β€”. 2014b. β€œWorst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat].
β€”β€”β€”. 2014c. β€œStatistical Theory for High-Dimensional Models.” arXiv:1409.8557 [Math, Stat].
β€”β€”β€”. 2016. Estimation and Testing Under Sparsity. Lecture Notes in Mathematics.
Geer, Sara van de, BΓΌhlmann, Ritov, et al. 2014. β€œOn Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models.” The Annals of Statistics.
Geer, Sara A. van de, BΓΌhlmann, and Zhou. 2011. β€œThe Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics.
Veitch, and Roy. 2015. β€œThe Class of Random Graphs Arising from Exchangeable Random Measures.” arXiv:1512.03099 [Cs, Math, Stat].
Wahba. 1990. Spline Models for Observational Data.
Wang, Zhangyang, Chang, Ling, et al. 2016. β€œStacked Approximated Regression Machine: A Simple Deep Learning Approach.” In.
Wang, L., Gordon, and Zhu. 2006. β€œRegularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning.” In Sixth International Conference on Data Mining (ICDM’06).
Wang, Hansheng, Li, and Jiang. 2007. β€œRobust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics.
Wisdom, Powers, Pitton, et al. 2016. β€œInterpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29.
Woodworth, and Chartrand. 2015. β€œCompressed Sensing Recovery via Nonconvex Shrinkage Penalties.” arXiv:1504.02923 [Cs, Math].
Wright, Nowak, and Figueiredo. 2009. β€œSparse Reconstruction by Separable Approximation.” IEEE Transactions on Signal Processing.
Wu, and Lange. 2008. β€œCoordinate Descent Algorithms for Lasso Penalized Regression.” The Annals of Applied Statistics.
Xu, Caramanis, and Mannor. 2010. β€œRobust Regression and Lasso.” IEEE Transactions on Information Theory.
β€”β€”β€”. 2012. β€œSparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Yaghoobi, Nam, Gribonval, et al. 2012. β€œNoise Aware Analysis Operator Learning for Approximately Cosparse Signals.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Yang, and Xu. 2013. β€œA Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3).
Yoshida, and West. 2010. β€œBayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research.
Yuan, and Lin. 2006. β€œModel Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
β€”β€”β€”. 2007. β€œModel Selection and Estimation in the Gaussian Graphical Model.” Biometrika.
Yun, and Toh. 2009. β€œA Coordinate Gradient Descent Method for β„“ 1-Regularized Convex Minimization.” Computational Optimization and Applications.
Zhang, Cun-Hui. 2010. β€œNearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics.
Zhang, Yiyun, Li, and Tsai. 2010. β€œRegularization Parameter Selections via Generalized Information Criterion.” Journal of the American Statistical Association.
Zhang, Lijun, Yang, Jin, et al. 2015. β€œSparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach.” arXiv:1511.03766 [Cs].
Zhang, Cun-Hui, and Zhang. 2014. β€œConfidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zhao, Tuo, Liu, and Zhang. 2018. β€œPathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory.” The Annals of Statistics.
Zhao, Peng, Rocha, and Yu. 2006. β€œGrouped and Hierarchical Model Selection Through Composite Absolute Penalties.”
β€”β€”β€”. 2009. β€œThe Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics.
Zhao, Peng, and Yu. 2006. β€œOn Model Selection Consistency of Lasso.” Journal of Machine Learning Research.
Zhou, Tao, and Wu. 2011. β€œManifold Elastic Net: A Unified Framework for Sparse Dimension Reduction.” Data Mining and Knowledge Discovery.
Zou. 2006. β€œThe Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association.
Zou, and Hastie. 2005. β€œRegularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zou, Hastie, and Tibshirani. 2007. β€œOn the β€˜Degrees of Freedom’ of the Lasso.” The Annals of Statistics.
Zou, and Li. 2008. β€œOne-Step Sparse Estimates in Nonconcave Penalized Likelihood Models.” The Annals of Statistics.