Restricted isometry properties
Plus incoherence, irrepresentability, and other uncertainty bounds for a sparse world, and maybe frame theory, what’s that now?
June 12, 2017 — March 9, 2020
Restricted isometry properties, a.k.a. uniform uncertainty principles (E. Candès and Tao 2005; E. J. Candès, Romberg, and Tao 2006), mutual incoherence (David L. Donoho 2006; D. L. Donoho, Elad, and Temlyakov 2006), irrepresentability conditions (Zhao and Yu 2006)…
This is mostly notes while I learn some definitions; expect no actual thoughts.
Recoverability conditions, as seen in sparse regression, sparse basis dictionaries, function approximation, compressed sensing etc. If you squint right you could imagine these uncertainty principles for a sparse world, or the foundations of a particular type of sampling theory.
Terry Tao mentions the various related conditions for the compressed sensing problem, and which types of random matrices satisfy them.
1 Restricted Isometry
The compressed sensing formulation.
The chatty lecture notes on uniform uncertainty look fun.
The restricted isometry constant of a matrix \(A\), is the smallest constant \(\delta_s\) \((1-\delta_s(A))\|x\|_2^2\leq \|Ax\|_2^2\leq (1+\delta_s(A))\|x\|_2^2\) for all \(s\)-sparse \(x\). That is, the measurement matrix does not change the norm of sparse signals “too much,” and in particular, does not null them when \(\delta_s < 1.\)
2 Irrepresentability
The setup is a little different for regression-type problems, which is where “representability” comes from. Here we care also about the design, roughly, the dependence of covariates we actually observe, and the noise distribution.
Zhao and Yu (2006) present an abstract condition called strong irrepresentability, which guarantees asymptotic sign consistency of selection. See also Meinshausen and Bühlmann (2006), who call this neighbourhood stability, which is even less catchy.
More recently Meinshausen and Yu (2009) extend this (and explain the original irrepresentability more clearly IMO):
Here we examine the behaviour of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l2-norm sense for fixed designs under conditions on (a) the number \(s_n\) of nonzero components of the vector \(\beta_n\) and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the l2 error with an appropriate choice of the smoothing parameter.
They do a good job of uniting prediction-error and model-selection consistency approaches. In fact, I will base everything off Meinshausen and Yu (2009), since not only is the prose lucid, it gives the background to the design assumptions and relaxation of coherence.
TBC.
3 Incoherence
A Basis-Pursuit noise-free setting.
D. L. Donoho, Elad, and Temlyakov (2006):
We can think of the atoms in our dictionary as columns in a matrix \(\Phi\), so that \(\Phi\) is \(n\) by \(m\) and \(m > n.\) A representation of \(y\in\mathbb{R}^n\) can be thought of as a vector \(\alpha\in\mathbb{R}^m\) satisfying \(y=\Phi\alpha.\)
The concept of mutual coherence of the dictionary […] is defined, assuming that the columns are normalised to unit \(\ell^2\)-norm, in terms of the Gram matrix \(G=\Phi^T\Phi\). With \(G(k,j)\) denoting entries of this matrix, the mutual coherence is
\[ M(\Phi) = \max_{1\leq k, j\leq m, k\neq j} |G(k,j)| \]
A dictionary is incoherent if \(M\) is small.
4 Frame theory
See frames.