# Measure concentration inequalities

## The fancy name for probability inequalities A corral captures the idea of concentration of measure; we have some procedure that guarantees that most of mass (of buffalos) is where we can handle it. Image: Kevin M Klerks, CC BY 2.0

Welcome to the probability inequality mines!

When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is particularly likely to be somewhere in particular, e.g. being 80% sure I am only 20% wrong.

As undergraduates we run into central limit theorems, but there are many more diverse ways we can keep track of our probability, or at least most of it. This idea is a basic workhorse in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PAC-bounds and suchlike.

## Background

Overviews include

## Markov

For any nonnegative random variable $$X,$$ and $$t>0$$ $\mathbb{P}\{X \geq t\} \leq \frac{\mathbb{E} X}{t}$ Corollary: if $$\phi$$ is a strictly monotonically increasing non-negative-valued function then for any random variable $$X$$ and real number $$t$$ $\mathbb{P}\{X \geq t\}=\mathbb{P}\{\phi(X) \geq \phi(t)\} \leq \frac{\mathbb{E} \phi(X)}{\phi(t)}$

## Chebychev

A corollary of Markov’s bound is with $$\phi(x)=x^{2}$$ is Chebyshev’s: if $$X$$ is an arbitrary random variable and $$t>0,$$ then $\mathbb{P}\{|X-\mathbb{E} X| \geq t\}=\mathbb{P}\left\{|X-\mathbb{E} X|^{2} \geq t^{2}\right\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{2}\right]}{t^{2}}=\frac{\operatorname{Var}\{X\}}{t^{2}}$ More generally taking $$\phi(x)=x^{q}(x \geq 0),$$ for any $$q>0$$ we have $\mathbb{P}\{|X-\mathbb{E} X| \geq t\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{q}\right]}{t^{q}}$ We can choose $$q$$ to optimize the obtained upper bound for the problem in hand.

## Chernoff

Taking $$\phi(x)=e^{s x}$$ where $$s>0,$$ for any random variable $$X,$$ and any $$t>0,$$ we have $\mathbb{P}\{X \geq t\}=\mathbb{P}\left\{e^{s X} \geq e^{s t}\right\} \leq \frac{\mathbb{E} e^{s X}}{e^{s t}}$ Once again, we choose $$s$$ to make the bound as tight as possible.

🏗

## Efron-Stein

Are these precisely results arising from Stein’s method

Let $$g: \mathcal{X}^{n} \rightarrow \mathbb{R}$$ be a real-valued measurable function of n variables. Efron-Stein inequalities concern the difference between the random variable $$Z=g\left(X_{1}, \ldots, X_{n}\right)$$ and its expected value $$\mathbb{E Z}$$ when $$X_{1}, \ldots, X_{n}$$ are arbitrary independent random variables.

Define $$\mathbb{E}_{i}$$ for the expected value with respect to the variable $$X_{i}$$, that is, $$\mathbb{E}_{i} Z=\mathbb{E}\left[Z \mid X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}\right]$$ Then $\operatorname{Var}(Z) \leq \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-\mathbb{E}_{i} Z\right)^{2}\right]$

Now, let $$X_{1}^{\prime}, \ldots, X_{n}^{\prime}$$ be an independent copy of $$X_{1}, \ldots, X_{n}$$. $Z_{i}^{\prime}=g\left(X_{1}, \ldots, X_{i}^{\prime}, \ldots, X_{n}\right)$ Alternatively, $\operatorname{Var}(Z) \leq \frac{1}{2} \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-Z_{i}^{\prime}\right)^{2}\right]$ Nothing here seems to constrain the variables here to be real-valued, merely the function $$g$$, but apparently they do not work for matrix variables as written — you need to see Matrix efron stein results for that.

🏗

## Gaussian

For the Gaussian distribution. Filed there, perhaps?

## Sub-Gaussian

🏗

E.g. Hanson-Wright.

🏗

## Khintchine

Let us copy from wikipedia:

Heuristically: if we pick $$N$$ complex numbers $$x_1,\dots,x_N \in\mathbb{C}$$, and add them together, each multiplied by jointly independent random signs $$\pm 1$$, then the expected value of the sum’s magnitude is close to $$\sqrt{|x_1|^{2}+ \cdots + |x_N|^{2}}$$.

Let $${\varepsilon_n}_{n=1}^N$$ i.i.d. random variables with $$P(\varepsilon_n=\pm1)=\frac12$$ for $$n=1,\ldots, N$$, i.e., a sequence with Rademacher distribution. Let $$0<p<\infty$$ and let $$x_1,\ldots,x_N \in \mathbb{C}$$. Then

$A_p \left( \sum_{n=1}^N |x_n|^2 \right)^{1/2} \leq \left(\operatorname{E} \left|\sum_{n=1}^N \varepsilon_n x_n\right|^p \right)^{1/p} \leq B_p \left(\sum_{n=1}^N |x_n|^2\right)^{1/2}$

for some constants $$A_p,B_p>0$$. It is a simple matter to see that $$A_p = 1$$ when $$p \ge 2$$, and $$B_p = 1$$ when $$0 < p \le 2$$.

🏗

## Matrix concentration

If we fix our interest to matrices in particular, some fun things arise. See Matrix concentration inequalities

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.