# Sparse regression

Penalised regression where the penalties are sparsifying. The prediction losses could be anything — likelihood, least-squares, robust Huberised losses, absolute deviation etc.

I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a smoothing parameters, controlling how many factors we still have to consider, from a subset of the original.

I will usually discuss our intent to minimise prediction error, but one could also try to minimise model selection error too.

Then we have a simultaneous estimation and model selection procedure, probably a specific sparse model selection procedure and we possibly have to choose clever optimisation method to do the whole thing fast. Related to compressed sensing, but here we consider sampling complexity and measurement error.

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

🏗 disambiguate the optimisation technologies at play — iteratively reweighted least squares etc.

Now! A set of headings under which I will try to understand some things, mostly the LASSO variants.

## LASSO

Quadratic loss penalty, absolute coefficient penalty. We estimate the regression coefficients $$\beta$$ by solving

\begin{aligned} \hat{\beta} = \underset{\beta \in \mathbb{R}^p}{\text{argmin}} \: \frac{1}{2} \| y - {\bf X} \beta \|_2^2 + \lambda \| \beta \|_1, \end{aligned}

The penalty coefficient $$\lambda$$ is left for you to choose, but one of the magical properties of the lasso is that it is easy to test many possible values of $$\lambda$$ at low marginal cost.

Popular because, amongst other reasons, it turns out to be in practice fast and convenient, and amenable to various performance accelerations e.g. aggressive approximate variable selection.

🏗 This is the one with famous oracle properties if you choose $$\lambda$$ correctly. Hsi Zou’s paper on this (Zou 2006) is readable. I am having trouble digesting Sara van de Geer’s paper on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, but with far more general assumptions on the model and loss functions, and some finite sample guarnatees.

## LARS

A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with a pencil and paper.

## Graph LASSO

As used in graphical models. 🏗

## Elastic net

Combination of $$L_1$$ and $$L_2$$ penalties. 🏗

## Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See .

## Model selection

Can be fiddly with sparse regression, which couples variable selection tightly with parameter estimation. See sparse model selection.

## Debiased LASSO

There exist a few versions, but the one I have needed is , section 2.1. See also and . (🏗 relation to ?)

## Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

## Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing.

## Other coefficient penalties

Put a weird penalty on the coefficients! E.g. “Smoothly Clipped Absolute Deviation” (SCAD). 🏗

## Other prediction losses

Put a weird penalty on the error! MAD prediction penalty, lasso-coefficient penalty, etc.

See for some implementations using e.g. maximum absolute prediction error.

## Implementations

Hastie, Friedman eta’s glmnet for R is fast and well-regarded, and has a MATLAB version. Here’s how to use it for adaptive lasso. Kenneth Tay has implemented elasticnet penalty for any GLM in glmnet.

SPAMS (C++, MATLAB, R, python) by Mairal, looks interesting. It’s an optimisation library for many, many sparse problems.

liblinear also include lasso-type solvers, as well as support-vector regression.

## Tidbits

Sparse regression as a universal classifier explainer? Local Interpretable Model-agnostic Explanations uses LASSO for model interpretation this. (See the blog post, or the source.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.