Measure concentration inequalities

The fancy name for probability inequalities



A corral captures the idea of concentration of measure; we have some procedure that guarantees that most of mass (of buffalos) is where we can handle it. Image: Kevin M Klerks, CC BY 2.0

Welcome to the probability inequality mines!

When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is particularly likely to be somewhere in particular, e.g. being 80% sure I am only 20% wrong.

As undergraduates we run into central limit theorems, but there are many more diverse ways we can keep track of our probability, or at least most of it. This idea is a basic workhorse in univariate probability, and turns out to be yet more essential in multivariate matrix probability, as seen in matrix factorisation, compressive sensing, PAC-bounds and suchlike.

Background

Overviews include

Markov

For any nonnegative random variable \(X,\) and \(t>0\) \[ \mathbb{P}\{X \geq t\} \leq \frac{\mathbb{E} X}{t} \] Corollary: if \(\phi\) is a strictly monotonically increasing non-negative-valued function then for any random variable \(X\) and real number \(t\) \[ \mathbb{P}\{X \geq t\}=\mathbb{P}\{\phi(X) \geq \phi(t)\} \leq \frac{\mathbb{E} \phi(X)}{\phi(t)} \]

Chebychev

A corollary of Markov’s bound is with \(\phi(x)=x^{2}\) is Chebyshev’s: if \(X\) is an arbitrary random variable and \(t>0,\) then \[ \mathbb{P}\{|X-\mathbb{E} X| \geq t\}=\mathbb{P}\left\{|X-\mathbb{E} X|^{2} \geq t^{2}\right\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{2}\right]}{t^{2}}=\frac{\operatorname{Var}\{X\}}{t^{2}} \] More generally taking \(\phi(x)=x^{q}(x \geq 0),\) for any \(q>0\) we have \[ \mathbb{P}\{|X-\mathbb{E} X| \geq t\} \leq \frac{\mathbb{E}\left[|X-\mathbb{E} X|^{q}\right]}{t^{q}} \] We can choose \(q\) to optimize the obtained upper bound for the problem in hand.

Chernoff

Taking \(\phi(x)=e^{s x}\) where \(s>0,\) for any random variable \(X,\) and any \(t>0,\) we have \[ \mathbb{P}\{X \geq t\}=\mathbb{P}\left\{e^{s X} \geq e^{s t}\right\} \leq \frac{\mathbb{E} e^{s X}}{e^{s t}} \] Once again, we choose \(s\) to make the bound as tight as possible.

Hoeffding

🏗

Efron-Stein

Are these precisely results arising from Stein’s method

Let \(g: \mathcal{X}^{n} \rightarrow \mathbb{R}\) be a real-valued measurable function of n variables. Efron-Stein inequalities concern the difference between the random variable \(Z=g\left(X_{1}, \ldots, X_{n}\right)\) and its expected value \(\mathbb{E Z}\) when \(X_{1}, \ldots, X_{n}\) are arbitrary independent random variables.

Define \(\mathbb{E}_{i}\) for the expected value with respect to the variable \(X_{i}\), that is, \(\mathbb{E}_{i} Z=\mathbb{E}\left[Z \mid X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{n}\right]\) Then \[ \operatorname{Var}(Z) \leq \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-\mathbb{E}_{i} Z\right)^{2}\right] \]

Now, let \(X_{1}^{\prime}, \ldots, X_{n}^{\prime}\) be an independent copy of \(X_{1}, \ldots, X_{n}\). \[ Z_{i}^{\prime}=g\left(X_{1}, \ldots, X_{i}^{\prime}, \ldots, X_{n}\right) \] Alternatively, \[ \operatorname{Var}(Z) \leq \frac{1}{2} \sum_{i=1}^{n} \mathbb{E}\left[\left(Z-Z_{i}^{\prime}\right)^{2}\right] \] Nothing here seems to constrain the variables here to be real-valued, merely the function \(g\), but apparently they do not work for matrix variables as written — you need to see Matrix efron stein results for that.

Kolmogorov

🏗

Gaussian

For the Gaussian distribution. Filed there, perhaps?

Sub-Gaussian

🏗

E.g. Hanson-Wright.

Martingale bounds

🏗

Khintchine

Let us copy from wikipedia:

Heuristically: if we pick \(N\) complex numbers \(x_1,\dots,x_N \in\mathbb{C}\), and add them together, each multiplied by jointly independent random signs \(\pm 1\), then the expected value of the sum’s magnitude is close to \(\sqrt{|x_1|^{2}+ \cdots + |x_N|^{2}}\).

Let \({\varepsilon_n}_{n=1}^N\) i.i.d. random variables with \(P(\varepsilon_n=\pm1)=\frac12\) for \(n=1,\ldots, N\), i.e., a sequence with Rademacher distribution. Let \(0<p<\infty\) and let \(x_1,\ldots,x_N \in \mathbb{C}\). Then

\[ A_p \left( \sum_{n=1}^N |x_n|^2 \right)^{1/2} \leq \left(\operatorname{E} \left|\sum_{n=1}^N \varepsilon_n x_n\right|^p \right)^{1/p} \leq B_p \left(\sum_{n=1}^N |x_n|^2\right)^{1/2} \]

for some constants \(A_p,B_p>0\). It is a simple matter to see that \(A_p = 1\) when \(p \ge 2\), and \(B_p = 1\) when \(0 < p \le 2\).

Empirical process theory

🏗

Matrix concentration

If we fix our interest to matrices in particular, some fun things arise. See Matrix concentration inequalities

References

Adler, Robert J. 2010. The Geometry of Random Fields. SIAM ed. Philadelphia: Society for Industrial and Applied Mathematics.
Adler, Robert J., and Jonathan E. Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115. New York: Springer.
Adler, Robert J, Jonathan E Taylor, and Keith J Worsley. 2016. Applications of Random Fields and Geometry Draft.
Aubrun, Guillaume, and Ion Nechita. 2011. The Multiplicative Property Characterizes \(\ell_p\) and \(L_p\) Norms.” Confluentes Mathematici 03 (04): 637–47.
Bach, Francis R. 2013. Sharp Analysis of Low-Rank Kernel Matrix Approximations. In COLT, 30:185–209.
Barron, Andrew, Lucien Birgé, and Pascal Massart. 1999. Risk Bounds for Model Selection via Penalization.” Probability Theory and Related Fields 113 (3): 301–413.
Bellec, Pierre C., Guillaume Lecué, and Alexandre B. Tsybakov. 2017. Towards the Study of Least Squares Estimators with Convex Penalty.” arXiv:1701.09120 [Math, Stat], January.
Berend, Daniel, Peter Harremoës, and Aryeh Kontorovich. 2012. Minimum KL-Divergence on Complements of L₁ Balls.” arXiv:1206.6544 [Cs, Math], June.
Bertsimas, D., and I. Popescu. 2005. Optimal Inequalities in Probability Theory: A Convex Optimization Approach.” SIAM Journal on Optimization 15 (3): 780–804.
Boissard, Emmanuel. 2011. Simple Bounds for the Convergence of Empirical and Occupation Measures in 1-Wasserstein Distance.” Electronic Journal of Probability 16 (none).
Boucheron, Stéphane, Olivier Bousquet, and Gábor Lugosi. 2004. Concentration Inequalities.” In Advanced Lectures in Machine Learning, edited by Olivier Bousquet, Ulrike von Luxburg, and G Rätsch.
Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2003. Concentration Inequalities Using the Entropy Method 31 (3): 1583–1614.
———. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. 1st ed. Oxford: Oxford University Press.
Bousquet, Olivier, Stéphane Boucheron, and Gábor Lugosi. 2004. Introduction to Statistical Learning Theory.” In Advanced Lectures on Machine Learning, edited by Olivier Bousquet, Ulrike von Luxburg, and Gunnar Rätsch, 169–207. Lecture Notes in Computer Science 3176. Springer Berlin Heidelberg.
Bühlmann, Peter, and Sara van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications. 2011 edition. Heidelberg ; New York: Springer.
Candès, Emmanuel J., and Benjamin Recht. 2009. Exact Matrix Completion via Convex Optimization.” Foundations of Computational Mathematics 9 (6): 717–72.
Candès, Emmanuel J., J. Romberg, and T. Tao. 2006. Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information.” IEEE Transactions on Information Theory 52 (2): 489–509.
Dasarathy, Gautam, Parikshit Shah, Badri Narayan Bhaskar, and Robert Nowak. 2013. Sketching Sparse Matrices.” arXiv:1303.6544 [Cs, Math], March.
DasGupta, Anirban. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer New York.
Dasgupta, Sanjoy. 2000. Experiments with Random Projection.” In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, 143–51. UAI’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Dasgupta, Sanjoy, Daniel Hsu, and Nakul Verma. 2006. A Concentration Theorem for Projections.” In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 114–21. UAI’06. Arlington, Virginia, USA: AUAI Press.
Del Moral, Pierre, Peng Hu, and Liming Wu. 2011. On the Concentration Properties of Interacting Particle Processes. Vol. 3. Now Publishers.
Dubhashi, Devdatt, and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. New York: Cambridge University Press.
Flammia, Steven T., David Gross, Yi-Kai Liu, and Jens Eisert. 2012. Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators.” New Journal of Physics 14 (9): 095022.
Geer, Sara van de. 1995. Exponential Inequalities for Martingales, with Application to Maximum Likelihood Estimation for Counting Processes.” The Annals of Statistics 23 (5): 1779–1801.
———. 2002. “On Hoeffdoing’s Inequality for Dependent Random Variables.” In Empirical Process Techniques for Dependent Data. Birkhhäuser.
———. 2014. Statistical Theory for High-Dimensional Models.” arXiv:1409.8557 [Math, Stat], September.
Geer, Sara van de, and Johannes Lederer. 2011. The Lasso, Correlated Design, and Improved Oracle Inequalities.” arXiv:1107.0189 [Stat], July.
Ghosal, Subhashis, and Anindya Roy. 2006. Posterior Consistency of Gaussian Process Prior for Nonparametric Binary Regression.” The Annals of Statistics 34 (5): 2413–29.
Giné, Evarist, and Richard Nickl. 2009. Uniform Limit Theorems for Wavelet Density Estimators.” The Annals of Probability 37 (4): 1605–46.
Gozlan, Nathael, and Christian Léonard. 2010. Transport Inequalities. A Survey.” arXiv:1003.3852 [Math], March.
Gross, D. 2011. Recovering Low-Rank Matrices From Few Coefficients in Any Basis.” IEEE Transactions on Information Theory 57 (3): 1548–66.
Gross, David, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. 2010. Quantum State Tomography via Compressed Sensing.” Physical Review Letters 105 (15).
Handel, Ramon van. 2017. Structured Random Matrices.” In Convexity and Concentration, edited by Eric Carlen, Mokshay Madiman, and Elisabeth M. Werner, 107–56. The IMA Volumes in Mathematics and Its Applications. New York, NY: Springer.
Hansen, Niels Richard, Patricia Reynaud-Bouret, and Vincent Rivoirard. 2015. Lasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli 21 (1): 83–143.
Hanson, D. L., and F. T. Wright. 1971. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables.” Annals of Mathematical Statistics 42 (3): 1079–83.
Horn, M. 1979. Some Inequalities for the Expectation of a Product of Functions of a Random Variable and for the Multivariate Distribution Function at a Random Point.” Biometrical Journal 21 (3): 243–45.
Houdré, Christian. 2002. Remarks on Deviation Inequalities for Functions of Infinitely Divisible Random Vectors.” The Annals of Probability 30 (3): 1223–37.
Houdré, Christian, and Nicolas Privault. 2002. Concentration and Deviation Inequalities in Infinite Dimensions via Covariance Representations.” Bernoulli 8 (6): 697–720.
Kennedy, Edward H. 2015. Semiparametric Theory and Empirical Processes in Causal Inference.” arXiv Preprint arXiv:1510.04740.
Koltchinskii, Prof Vladimir. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics École d’Été de Probabilités de Saint-Flour 2033. Heidelberg: Springer.
Kontorovich, Aryeh, and Maxim Raginsky. 2016. Concentration of Measure Without Independence: A Unified Approach via the Martingale Method.” arXiv:1602.00721 [Cs, Math], February.
Kroll, Martin. 2016. Concentration Inequalities for Poisson Point Processes with Application to Adaptive Intensity Estimation.” arXiv:1612.07901 [Math, Stat], December.
Kuznetsov, Vitaly, and Mehryar Mohri. 2014. Generalization Bounds for Time Series Prediction with Non-Stationary Processes.” In Algorithmic Learning Theory, edited by Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles, 260–74. Lecture Notes in Computer Science. Bled, Slovenia: Springer International Publishing.
———. 2015. Learning Theory and Algorithms for Forecasting Non-Stationary Time Series.” In Advances in Neural Information Processing Systems, 541–49. Curran Associates, Inc.
———. 2016. Generalization Bounds for Non-Stationary Mixing Processes.” In Machine Learning Journal.
Lahiri, Subhaneil, Peiran Gao, and Surya Ganguli. 2016. Random Projections of Random Manifolds.” arXiv:1607.04331 [Cs, q-Bio, Stat], July.
Lederer, Johannes, and Sara van de Geer. 2014. New Concentration Inequalities for Suprema of Empirical Processes.” Bernoulli 20 (4): 2020–38.
Liggett, Thomas M. 2010. Stochastic Models for Large Interacting Systems and Related Correlation Inequalities.” Proceedings of the National Academy of Sciences of the United States of America 107 (38): 16413–19.
Lugosi, Gábor. 2003. Concentration-of-Measure Inequalities.
Massart, Pascal. 2000. Some Applications of Concentration Inequalities to Statistics.” In Annales de La Faculté Des Sciences de Toulouse: Mathématiques, 9:245–303.
———. 2007. Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003. Lecture Notes in Mathematics 1896. Berlin ; New York: Springer-Verlag.
Maugis, Cathy, and Bertrand Michel. 2011. A Non Asymptotic Penalized Criterion for Gaussian Mixture Model Selection.” ESAIM: Probability and Statistics 15: 41–68.
Pardoux, Etienne, D. L. Burkholder, and Alain-Sol Sznitman. 1991. Explorations in Martingale Theory and Its Applications/ Topics in Propagation of Chaos: Ecole D’été De Probabilités De Saint-Flour Xix, 1989. Edited by P. L. Hennequin. Berlin ; New York : Springer-Verlag.
Raginsky, Maxim, and Igal Sason. 2012. Concentration of Measure Inequalities in Information Theory, Communications and Coding.” Foundations and Trends in Communications and Information Theory, December.
Rahimi, Ali, and Benjamin Recht. 2009. Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems, 1313–20. Curran Associates, Inc.
Reynaud-Bouret, Patricia, and Emmanuel Roy. 2007. “Some Non Asymptotic Tail Estimates for Hawkes Processes.” Bulletin of the Belgian Mathematical Society - Simon Stevin 13 (5): 883–96.
Stein, Charles. 1986. Approximate Computation of Expectations. Vol. 7. IMS.
Talagrand, Michel. 1995. Concentration of Measure and Isoperimetric Inequalities in Product Spaces.”
———. 1996. A New Look at Independence.” The Annals of Probability 24 (1): 1–34.
Touchette, Hugo. 2011. A Basic Introduction to Large Deviations: Theory, Applications, Simulations,” June.
Tropp, Joel A. 2015. An Introduction to Matrix Concentration Inequalities.
Vershynin, Roman. 2011. Introduction to the Non-Asymptotic Analysis of Random Matrices.” arXiv:1011.3027 [Cs, Math], November.
———. 2015. Estimation in High Dimensions: A Geometric Perspective.” In Sampling Theory, a Renaissance: Compressive Sensing and Other Developments, edited by Götz E. Pfander, 3–66. Applied and Numerical Harmonic Analysis. Cham: Springer International Publishing.
———. 2016. Four Lectures on Probabilistic Methods for Data Science,” December.
———. 2018. High-Dimensional Probability: An Introduction with Applications in Data Science. 1st ed. Cambridge University Press.
Wright, F. T. 1973. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions Are Not Necessarily Symmetric.” Annals of Probability 1 (6): 1068–70.
Wu, Xinxing, and Junping Zhang. 2016. Distribution-Dependent Concentration Inequalities for Tighter Generalization Bounds.” arXiv:1607.05506 [Stat], July.
Zhang, Huiming, and Song Xi Chen. 2020. Concentration Inequalities for Statistical Inference.” arXiv:2011.02258 [Cs, Math, Stat], November.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.