# Measure concentration inequalities

## On being 80% sure I am only 20% wrong A corral captures the idea of concentration of measure; we have some procedure that guarantees that most of mass (of buffalos) is where we can handle it. Image: Kevin M Klerks, CC BY 2.0

Welcome to the probability inequality mines!

When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is particularly likely to be somewhere in particular.

As undergraduates we run into central limit theorems, but there are many more diverse ways we can keep track of our probability, or at least most of it. This idea is a basic workhorse in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PAC-bounds and suchlike.

## Background

Overviews include

## Markov

For any nonnegative random variable $$X,$$ and $$t>0$$ $\mathbb{P}\{X \geq t\} \leq \frac{\mathbb{E} X}{t}$ Corollary: if $$\phi$$ is a strictly monotonically increasing nonnegative-valued function then for any random variable $$X$$ and real number $$t$$ $\mathbb{P}\{X \geq t\}=\mathbb{P}\{\phi(X) \geq \phi(t)\} \leq \frac{\mathbb{E} \phi(X)}{\phi(t)}$

## Chebychev

Another Corollary of MArkov’s bound is with $$\phi(x)=x^{2}$$ is Chebyshev’s
if $$X$$ is an arbitrary random variable and $$t>0,$$ then $\mathbb{P}\{|X-\mathbb{E} X| \geq t\}=\mathbb{P}\left\{|X-\mathbb{E} X|^{2} \geq t^{2}\right\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{2}\right]}{t^{2}}=\frac{\operatorname{Var}\{X\}}{t^{2}}$ More generally taking $$\phi(x)=x^{q}(x \geq 0),$$ for any $$q>0$$ we have $\mathbb{P}\{|X-\mathbb{E} X| \geq t\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{q}\right]}{t^{q}}$ We can choose $$q$$ to optimize the obtained upper bound for the problem in hand.

## Chernoff

Taking $$\phi(x)=e^{s x}$$ where $$s>0,$$ for any random variable $$X,$$ and any $$t>0,$$ we have $\mathbb{P}\{X \geq t\}=\mathbb{P}\left\{e^{s X} \geq e^{s t}\right\} \leq \frac{\mathbb{E} e^{s X}}{e^{s t}}$ Once again, we choose $$s$$ to make the bound as tight as possible.

🏗

## Efron-Stein

Are these precisely results arising from Stein’s method

Let $$g: \mathcal{X}^{n} \rightarrow \mathbb{R}$$ be a real-valued measurable function of n variables. Efron-Stein inequalities concern the difference between the random variable $$Z=g\left(X_{1}, \ldots, X_{n}\right)$$ and its expected value $$\mathbb{E Z}$$ when $$X_{1}, \ldots, X_{n}$$ are arbitrary independent random variables.

Define $$\mathbb{E}_{i}$$ for the expected value with respect to the variable $$X_{i}$$, that is, $$\mathbb{E}_{i} Z=\mathbb{E}\left[Z \mid X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}\right]$$ Then $\operatorname{Var}(Z) \leq \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-\mathbb{E}_{i} Z\right)^{2}\right]$

Now, let $$X_{1}^{\prime}, \ldots, X_{n}^{\prime}$$ be an independent copy of $$X_{1}, \ldots, X_{n}$$. $Z_{i}^{\prime}=g\left(X_{1}, \ldots, X_{i}^{\prime}, \ldots, X_{n}\right)$ Alternatively, $\operatorname{Var}(Z) \leq \frac{1}{2} \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-Z_{i}^{\prime}\right)^{2}\right]$ Nothing here seems to constrain the variables here to be real-valued, merely the function $$g$$, but apparently they do not work for matrix variables as written — you need to see Matrix efron stein results for that.

🏗

## Gaussian

For the Gaussian distribution. Filed there, perhaps?

## Sub-Gaussian

🏗

E.g. Hanson-Wright.

🏗

## Khintchine

Let us copy from wikipedia:

Heuristically: if we pick $$N$$ complex numbers $$x_1,\dots,x_N \in\mathbb{C}$$, and add them together, each multiplied by jointly independent random signs $$\pm 1$$, then the expected value of the sum’s magnitude is close to $$\sqrt{|x_1|^{2}+ \cdots + |x_N|^{2}}$$.

Let $${\varepsilon_n}_{n=1}^N$$ i.i.d. random variables with $$P(\varepsilon_n=\pm1)=\frac12$$ for $$n=1,\ldots, N$$, i.e., a sequence with Rademacher distribution. Let $$0<p<\infty$$ and let $$x_1,\ldots,x_N \in \mathbb{C}$$. Then

$A_p \left( \sum_{n=1}^N |x_n|^2 \right)^{1/2} \leq \left(\operatorname{E} \left|\sum_{n=1}^N \varepsilon_n x_n\right|^p \right)^{1/p} \leq B_p \left(\sum_{n=1}^N |x_n|^2\right)^{1/2}$

for some constants $$A_p,B_p>0$$. It is a simple matter to see that $$A_p = 1$$ when $$p \ge 2$$, and $$B_p = 1$$ when $$0 < p \le 2$$.

🏗

## Matrix concentration

If we fix our interest to matrices in particular, some fun things arise. See Matrix concentration inequalities

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.