# Measure concentration inequalities

The fancy name for probability inequalities

November 25, 2014 — March 4, 2021

approximation
dynamical systems
functional analysis
measure
metrics
model selection
probability
stochastic processes

Welcome to the probability inequality mines!

When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is particularly likely to be somewhere in particular, e.g. being 80% sure I am only 20% wrong.

As undergraduates we run into central limit theorems, but there are many more diverse ways we can keep track of our probability, or at least most of it. This idea is a basic workhorse in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PAC-bounds and suchlike.

## 1 Background

Overviews include

## 2 Markov

For any nonnegative random variable $$X,$$ and $$t>0$$ $\mathbb{P}\{X \geq t\} \leq \frac{\mathbb{E} X}{t}$ Corollary: if $$\phi$$ is a strictly monotonically increasing non-negative-valued function then for any random variable $$X$$ and real number $$t$$ $\mathbb{P}\{X \geq t\}=\mathbb{P}\{\phi(X) \geq \phi(t)\} \leq \frac{\mathbb{E} \phi(X)}{\phi(t)}$

## 3 Chebychev

A corollary of Markov’s bound is with $$\phi(x)=x^{2}$$ is Chebyshev’s: if $$X$$ is an arbitrary random variable and $$t>0,$$ then $\mathbb{P}\{|X-\mathbb{E} X| \geq t\}=\mathbb{P}\left\{|X-\mathbb{E} X|^{2} \geq t^{2}\right\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{2}\right]}{t^{2}}=\frac{\operatorname{Var}\{X\}}{t^{2}}$ More generally taking $$\phi(x)=x^{q}(x \geq 0),$$ for any $$q>0$$ we have $\mathbb{P}\{|X-\mathbb{E} X| \geq t\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{q}\right]}{t^{q}}$ We can choose $$q$$ to optimize the obtained upper bound for the problem in hand.

## 4 Chernoff

Taking $$\phi(x)=e^{s x}$$ where $$s>0,$$ for any random variable $$X,$$ and any $$t>0,$$ we have $\mathbb{P}\{X \geq t\}=\mathbb{P}\left\{e^{s X} \geq e^{s t}\right\} \leq \frac{\mathbb{E} e^{s X}}{e^{s t}}$ Once again, we choose $$s$$ to make the bound as tight as possible.

🏗

## 6 Efron-Stein

Are these precisely results arising from Stein’s method

Let $$g: \mathcal{X}^{n} \rightarrow \mathbb{R}$$ be a real-valued measurable function of n variables. Efron-Stein inequalities concern the difference between the random variable $$Z=g\left(X_{1}, \ldots, X_{n}\right)$$ and its expected value $$\mathbb{E Z}$$ when $$X_{1}, \ldots, X_{n}$$ are arbitrary independent random variables.

Define $$\mathbb{E}_{i}$$ for the expected value with respect to the variable $$X_{i}$$, that is, $$\mathbb{E}_{i} Z=\mathbb{E}\left[Z \mid X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}\right]$$ Then $\operatorname{Var}(Z) \leq \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-\mathbb{E}_{i} Z\right)^{2}\right]$

Now, let $$X_{1}^{\prime}, \ldots, X_{n}^{\prime}$$ be an independent copy of $$X_{1}, \ldots, X_{n}$$. $Z_{i}^{\prime}=g\left(X_{1}, \ldots, X_{i}^{\prime}, \ldots, X_{n}\right)$ Alternatively, $\operatorname{Var}(Z) \leq \frac{1}{2} \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-Z_{i}^{\prime}\right)^{2}\right]$ Nothing here seems to constrain the variables here to be real-valued, merely the function $$g$$, but apparently they do not work for matrix variables as written — you need to see Matrix efron stein results for that.

🏗

## 8 Gaussian

For the Gaussian distribution. Filed there, perhaps?

## 9 Sub-Gaussian

🏗

E.g. Hanson-Wright.

🏗

## 11 Khintchine

Let us copy from wikipedia:

Heuristically: if we pick $$N$$ complex numbers $$x_1,\dots,x_N \in\mathbb{C}$$, and add them together, each multiplied by jointly independent random signs $$\pm 1$$, then the expected value of the sum’s magnitude is close to $$\sqrt{|x_1|^{2}+ \cdots + |x_N|^{2}}$$.

Let ${n}{n=1}^N$ i.i.d. random variables with $$P(\varepsilon_n=\pm1)=\frac12$$ for $$n=1,\ldots, N$$, i.e., a sequence with Rademacher distribution. Let $$0<p<\infty$$ and let $x_1,,x_N$. Then

$A_p \left( \sum_{n=1}^N |x_n|^2 \right)^{1/2} \leq \left(\operatorname{E} \left|\sum_{n=1}^N \varepsilon_n x_n\right|^p \right)^{1/p} \leq B_p \left(\sum_{n=1}^N |x_n|^2\right)^{1/2}$

for some constants $A_p,B_p>0$. It is a simple matter to see that $$A_p = 1$$ when $$p \ge 2$$, and $$B_p = 1$$ when $$0 < p \le 2$$.

🏗

## 13 Matrix concentration

If we fix our interest to matrices in particular, some fun things arise. See Matrix concentration inequalities

## 15 References

Adler, Robert J. 2010. The Geometry of Random Fields.
Adler, Robert J., and Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115.
Adler, Robert J, Taylor, and Worsley. 2016. Applications of Random Fields and Geometry Draft.
Aubrun, and Nechita. 2011. Confluentes Mathematici.
Bach. 2013. In COLT.
Barron, Birgé, and Massart. 1999. Probability Theory and Related Fields.
Bellec, Lecué, and Tsybakov. 2017. arXiv:1701.09120 [Math, Stat].
Berend, Harremoës, and Kontorovich. 2012. arXiv:1206.6544 [Cs, Math].
Bertsimas, and Popescu. 2005. SIAM Journal on Optimization.
Boissard. 2011. Electronic Journal of Probability.
Boucheron, Bousquet, and Lugosi. 2004. In Advanced Lectures in Machine Learning.
Boucheron, Lugosi, and Massart. 2003.
———. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence.
Bousquet, Boucheron, and Lugosi. 2004. In Advanced Lectures on Machine Learning. Lecture Notes in Computer Science 3176.
Bühlmann, and van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications.
Candès, and Recht. 2009. Foundations of Computational Mathematics.
Candès, Romberg, and Tao. 2006. IEEE Transactions on Information Theory.
Dasarathy, Shah, Bhaskar, et al. 2013. arXiv:1303.6544 [Cs, Math].
Dasgupta. 2000. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence. UAI’00.
DasGupta. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics.
Dasgupta, Hsu, and Verma. 2006. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. UAI’06.
Del Moral, Hu, and Wu. 2011. On the Concentration Properties of Interacting Particle Processes.
Dubhashi, and Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms.
Flammia, Gross, Liu, et al. 2012. New Journal of Physics.
Ghosal, and Roy. 2006. The Annals of Statistics.
Giné, and Nickl. 2009. The Annals of Probability.
Gozlan, and Léonard. 2010. arXiv:1003.3852 [Math].
Gross, D. 2011. IEEE Transactions on Information Theory.
Gross, David, Liu, Flammia, et al. 2010. Physical Review Letters.
Hansen, Reynaud-Bouret, and Rivoirard. 2015. Bernoulli.
Hanson, and Wright. 1971. Annals of Mathematical Statistics.
Houdré. 2002. The Annals of Probability.
Houdré, and Privault. 2002. Bernoulli.
Kennedy. 2015. arXiv Preprint arXiv:1510.04740.
Koltchinskii. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics École d’Été de Probabilités de Saint-Flour 2033.
Kontorovich, and Raginsky. 2016. arXiv:1602.00721 [Cs, Math].
Kroll. 2016. arXiv:1612.07901 [Math, Stat].
Kuznetsov, and Mohri. 2014. In Algorithmic Learning Theory. Lecture Notes in Computer Science.
———. 2015. In Advances in Neural Information Processing Systems.
———. 2016. In Machine Learning Journal.
Lahiri, Gao, and Ganguli. 2016. arXiv:1607.04331 [Cs, q-Bio, Stat].
Lederer, and van de Geer. 2014. Bernoulli.
Liggett. 2010. Proceedings of the National Academy of Sciences of the United States of America.
Lugosi. 2003. Concentration-of-Measure Inequalities.
Massart. 2000. In Annales de La Faculté Des Sciences de Toulouse: Mathématiques.
———. 2007. Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003. Lecture Notes in Mathematics 1896.
Maugis, and Michel. 2011. ESAIM: Probability and Statistics.
Pardoux, Burkholder, and Sznitman. 1991. Explorations in Martingale Theory and Its Applications/ Topics in Propagation of Chaos: Ecole D’été De Probabilités De Saint-Flour Xix, 1989. Edited by P. L. Hennequin.
Raginsky, and Sason. 2012. Foundations and Trends in Communications and Information Theory.
Rahimi, and Recht. 2009. In Advances in Neural Information Processing Systems.
Reynaud-Bouret, and Roy. 2007. “Some Non Asymptotic Tail Estimates for Hawkes Processes.” Bulletin of the Belgian Mathematical Society - Simon Stevin.
Stein. 1986. Approximate Computation of Expectations.
Talagrand. 1995.
———. 1996. The Annals of Probability.
Touchette. 2011.
Tropp. 2015. An Introduction to Matrix Concentration Inequalities.
van de Geer. 1995. The Annals of Statistics.
———. 2002. “On Hoeffdoing’s Inequality for Dependent Random Variables.” In Empirical Process Techniques for Dependent Data.
———. 2014. arXiv:1409.8557 [Math, Stat].
van de Geer, and Lederer. 2011. arXiv:1107.0189 [Stat].
van Handel. 2017. In Convexity and Concentration. The IMA Volumes in Mathematics and Its Applications.
Vershynin. 2011. arXiv:1011.3027 [Cs, Math].
———. 2015. In Sampling Theory, a Renaissance: Compressive Sensing and Other Developments. Applied and Numerical Harmonic Analysis.
Wright. 1973. Annals of Probability.
Wu, and Zhang. 2016. arXiv:1607.05506 [Stat].
Zhang, and Chen. 2020. arXiv:2011.02258 [Cs, Math, Stat].