Concentration inequalities for matrix-valued random variables, which are, loosely speaking, promises that some matrix is close to some known value measured in some metric with some probability.
Recommended overviews are J. A. Tropp (2015);van Handel (2017);Vershynin (2018).
Matrix Chernoff
J. A. Tropp (2015) summarises:
In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic.
Are these related?
Nikhil Srivastava’s Discrepancy, Graphs, and the Kadison-Singer Problem has an interesting example of bounds via discrepancy theory (and only indirectly probability). D. Gross (2011) is also readable, and gives results for matrices over the complex field.
Matrix Chebychev
As discussed in, e.g. Paulin, Mackey, and Tropp (2016).
Let \(\mathbf{X} \in \mathbb{H}^{d}\) be a random matrix. For all \(t>0\) \[
\mathbb{P}\{\|\mathbf{X}\| \geq t\} \leq \inf _{p \geq 1} t^{-p} \cdot \mathbb{E}\|\mathbf{X}\|_{S_{p}}^{p}
\] Furthermore, \[
\mathbb{E}\|\mathbf{X}\| \leq \inf _{p \geq 1}\left(\mathbb{E}\|\mathbf{X}\|_{S_{p}}^{p}\right)^{1 / p}.
\]
Matrix Efron-Stein
The “classical” Efron-Stein inequalities are simple. The Matrix ones, not so much
e.g. Paulin, Mackey, and Tropp (2016).
Gaussian
Handy results from Vershynin (2018):
Takes \(X \sim N\left(0, I_{n}\right).\)
Show that, for any fixed vectors \(u, v \in \mathbb{R}^{n},\) we have \[
\mathbb{E}\langle X, u\rangle\langle X, v\rangle=\langle u, v\rangle
\]
Given a vector \(u \in \mathbb{R}^{n}\), consider the random variable \(X_{u}:=\langle X, u\rangle .\)
Further, we know that \(X_{u} \sim N\left(0,\|u\|_{2}^{2}\right) .\) It follows that \[
\mathbb{E}\left[(X_{u}-X_{v})^2\right]^{1/2}=\|u-v\|_{2}
\] for any fixed vectors \(u, v \in \mathbb{R}^{n} .\)
Grothendieck’s identity: For any fixed vectors \(u, v \in S^{n-1},\) we have \[
\mathbb{E} \operatorname{sign}X_{u} \operatorname{sign}X_{v}=\frac{2}{\pi} \arcsin \langle u, v\rangle.
\]
References
Ahlswede, and Winter. 2002.
“Strong Converse for Identification via Quantum Channels.” IEEE Transactions on Information Theory.
Ando. 1995.
“Matrix Young Inequalities.” In
Operator Theory in Function Spaces and Banach Lattices: Essays Dedicated to A.C. Zaanen on the Occasion of His 80th Birthday. Operator Theory Advances and Applications.
Bakherad, Krnic, and Moslehian. 2016.
“Reverses of the Young Inequality for Matrices and Operators.” Rocky Mountain Journal of Mathematics.
Bhatia. 1997.
Matrix Analysis. Graduate Texts in Mathematics.
Bhatia, and Parthasarathy. 2000.
“Positive Definite Functions and Operator Inequalities.” Bulletin of the London Mathematical Society.
Bishop, Del Moral, and Niclas. 2018.
“An Introduction to Wishart Matrix Moments.” Foundations and Trends® in Machine Learning.
Boucheron, Lugosi, and Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence.
Bousquet, Luxburg, and Rtsch. 2004. Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, T Bingen, Germany, August 4-16, 2003, Revised Lectures.
Brault, d’Alché-Buc, and Heinonen. 2016.
“Random Fourier Features for Operator-Valued Kernels.” In
Proceedings of The 8th Asian Conference on Machine Learning.
Camino, Helton, Skelton, et al. 2003. “Matrix Inequalities: A Symbolic Procedure to Determine Convexity Automatically.” Integral Equation and Operator Theory.
Candès, and Recht. 2009.
“Exact Matrix Completion via Convex Optimization.” Foundations of Computational Mathematics.
Conde. 2013. “Young Type Inequalities for Positive Operators.” Annals of Functional Analysis.
Gross, David, Liu, Flammia, et al. 2010.
“Quantum State Tomography via Compressed Sensing.” Physical Review Letters.
Hirzallah, and Kittaneh. 2000.
“Matrix Young Inequalities for the Hilbert–Schmidt Norm.” Linear Algebra and Its Applications.
Mahoney. 2016.
“Lecture Notes on Spectral Graph Methods.” arXiv Preprint arXiv:1608.04845.
Mohri, Rostamizadeh, and Talwalkar. 2018. Foundations of Machine Learning. Adaptive Computation and Machine Learning.
Paulin, Mackey, and Tropp. 2016.
“Efron–Stein Inequalities for Random Matrices.” The Annals of Probability.
Tropp, Joel. 2019. Matrix Concentration & Computational Linear Algebra / ENS Short Course.
Uchiyama. 2020.
“Some Results on Matrix Means.” Advances in Operator Theory.
van de Geer. 2014.
“Statistical Theory for High-Dimensional Models.” arXiv:1409.8557 [Math, Stat].
van Handel. 2017.
“Structured Random Matrices.” In
Convexity and Concentration. The IMA Volumes in Mathematics and Its Applications.
———. 2015.
“Estimation in High Dimensions: A Geometric Perspective.” In
Sampling Theory, a Renaissance: Compressive Sensing and Other Developments. Applied and Numerical Harmonic Analysis.
Wainwright. 2019. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics 48.
Woodruff. 2014.
Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science 1.0.