# Randomized low dimensional projections

March 12, 2021 — May 24, 2021

algebra
approximation
feature construction
functional analysis
geometry
high d
linear algebra
measure
metrics
probability
signal processing
sparser than thou
spheres

One way I can get at the confusing behaviours of high dimensional distributions is to instead look at low dimensional projections of them. If I have a (possibly fixed) data matrix and a random dimensional projection, what distribution does the projection have?

This idea pertains to many others: matrix factorisations, restricted isometry properties, Riesz bases, randomised regression, compressed sensing. You could also consider these results as arising from/resulting in certain structured random matrices.

## 1 Tutorials

There is a confusing note soup, sorry. You might find it better to read a coherent overview such as Meckes’ lecture slides which include a lot of important recent developments, many of which she invented:

Related: Weird slicing problems in convex geometry. For a theoretical background as to how that relates, see Guédon (2014).

## 2 Inner products

Djalil Chafaï introduces the The Funk-Hecke formula, also mentioned under isotropic RVs, which gives us a formula for a particularly simple case, of unit-norm RVs:

… if $$X$$ is a random vector of $$\mathbb{R}^{n}$$ uniformly distributed on $$\mathbb{S}^{n-1}$$ then for all $$y \in \mathbb{S}^{n-1}$$, the law of $$X \cdot y$$ has density $t \in[-1,1] \mapsto \frac{\Gamma\left(\frac{n}{2}\right)}{\sqrt{\pi} \Gamma\left(\frac{n-1}{2}\right)}\left(1-t^{2}\right)^{\frac{n-3}{2}}$ This law does not depend on the choice of $$y \in \mathbb{S}^{n-1}$$. It is symmetric in the sense that $$X \cdot y$$ and $$-(X \cdot y)$$ have same law. The law of $$|X \cdot y|$$ is the image of the law $$\operatorname{Beta}\left(\frac{3}{2}, \frac{n-1}{2}\right)$$ by the map $$u \mapsto \sqrt{u}$$. The law of $$X \cdot y$$ is

• if $$n=2$$ : an arcsine law,
• if $$n=3$$ : a uniform law (Archimedes principle),
• if $$n=4$$ : a semicircle law.

whuber asserts that $$(X \cdot y+1)\sim\operatorname{Beta}((d-1)/2,(d-1)/2),$$ and $$\operatorname{Var}(X \cdot y)=1/d.$$

## 3 Random projections are kinda Gaussian

More generally things are not so exact. But they are still reasonably nice, in that there are lots of tasty limit theorems with nice regular behaviour.

A classic introductory concept: Diaconis-Freedman effect. Diaconis and Freedman (1984) show that (under some mild omitted conditions), $\left\{x_{1}, \ldots, x_{n}\right\} \subseteq \mathbb{R}^{d}$ is a data set (possibly deterministic with no assumption on generating process), $$\theta$$ is a uniform random point in the sphere $$\mathbb{S}^{d-1},$$ and $\mu_{x}^{\theta}:=\frac{1}{n} \sum_{i=1}^{n} \delta_{\left\langle x_{i}, \theta\right\rangle}$ is the empirical measure of the projection of the $$x_{i}$$ onto $$\theta$$, then as $$n, d \rightarrow \infty,$$ the measures $$\mu_{x}^{\theta}$$ tend to $$\mathcal{N}\left(0, \sigma^{2}\right)$$ weakly in probability. This succinct statement is modeled on Elizabeth Meckes’.

A lesson is that even non-Gaussian, non-independent data can become nearly i.i.d. Gaussian in low dimensional projection, as Dasgupta, Hsu, and Verma (2006) argue in their introduction.

This has been taken to incredible depth in the work of Elizabeth Meckes 1980—2020 whose papers serve as the canonical textbook in the area for now. Two foundational ones are Chatterjee and Meckes (2008) and E. Meckes (2009) and there is a kind of user guide in E. Meckes (2012b) which leverages Stein’s method a whole bunch.

## 4 Random projections are distance preserving

What makes random embeddings go. The most famous result is the Johnson-Lindenstrauss lemma.

A simple proof of that is given by Dasgupta and Gupta (2003)

• “Locality-Sensitive Hashing (LSH) is an algorithm for solving the approximate or exact Near Neighbor Search in high dimensional spaces.” (Is this random even?)

• John Myles White explains why the Johnson-Lindenstrauss lemma is worth knowing

## 5 Projection statistics

Another key phrase we can look for is probability on the Stiefel manifold, which is a generalization of a familiar concept from random orthonormal matrices. Stiefel manifolds generalise an orthonormal matrix because they can map between spaces of different dimension. Formally, the Stiefel manifold $$V_{k, m}$$ is the space of $$k$$ frames in the $$m$$ -dimensional real Euclidean space $$R^{m},$$ represented by the set of $$m \times k$$ matrices $$X$$ such that $$X^{\prime} X=I_{k},$$ where $$I_{k}$$ is the $$k \times k$$ identity matrix. There are some interesting cases in low dimensional projections served by $$k\ll m$$ especially $$k=1.$$

Cool results in this domain are, e.g. Chikuse and 筑瀬 (2003);E. S. Meckes and Meckes (2013);E. Meckes (2012a);Stam (1982).

General projections results are in Dümbgen and Del Conte-Zerial (2013).

An important trick is the distribution of isotropic unit vectors.

Let $$Z=Z^{(q)}:=\left(Z_{1}, Z_{2}, \ldots, Z_{d}\right)$$ be a random matrix in $$\mathbb{R}^{m \times k}$$ with independent, standard Gaussian column vectors $$Z_{j} \in \mathbb{R}^{m} .$$ Then $\Theta:=Z\left(Z^{\top} Z\right)^{-1 / 2}=Z/\|Z\|_2^2$ has the desired distribution, and $\Theta=m^{-1 / 2} Z\left(I+O_{p}\left(m^{-1 / 2}\right)\right) \quad \text { as } m \rightarrow \infty.$

Vershynin’s writing on a variety of hard high-dimensional probability results is pretty accessible: Vershynin (2015);Vershynin (2018). These bleed over into concentration results.

I wrote one of my own… TBD.

## 6 Concentration theorems for projections

Many, e.g. Dasgupta, Hsu, and Verma (2006);Dümbgen and Del Conte-Zerial (2013);Gantert, Kim, and Ramanan (2017);Kim, Liao, and Ramanan (2020).

## 7 References

Achlioptas. 2003. Journal of Computer and System Sciences, Special Issue on PODS 2001,.
Anttila, Ball, and Perissinaki. 2003. Transactions of the American Mathematical Society.
Beckner. 1989. Proceedings of the American Mathematical Society.
Bhattacharya, and Bhattacharya. 2012. Nonparametric Inference on Manifolds: With Applications to Shape Spaces. Institute of Mathematical Statistics Monographs.
Bingham, and Mannila. 2001. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’01.
Bobkov. 2003. The Annals of Probability.
Bobkov, and Koldobsky. 2003. In Geometric Aspects of Functional Analysis: Israel Seminar 2001-2002. Lecture Notes in Mathematics.
Brehm, and Voigt. n.d. “Asymptotics of Cross Sections for Convex Bodiesy.”
Candès, and Tao. 2006. IEEE Transactions on Information Theory.
Chatterjee, and Meckes. 2008. arXiv:math/0701464.
Chikuse, and 筑瀬. 2003. Statistics on Special Manifolds.
Dasgupta. 2000. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence. UAI’00.
Dasgupta, and Gupta. 2003. Random Structures & Algorithms.
Dasgupta, Hsu, and Verma. 2006. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. UAI’06.
Diaconis, and Freedman. 1984. The Annals of Statistics.
Dümbgen, and Del Conte-Zerial. 2013. From Probability to Statistics and Back: High-Dimensional Models and Processes – A Festschrift in Honor of Jon A. Wellner.
Eldan, R., and Klartag. 2008. Journal of Functional Analysis.
Eldan, Ronen, and Klartag. 2010. arXiv:1001.0875 [Math].
Freund, Dasgupta, Kabra, et al. 2007. In Advances in Neural Information Processing Systems.
Gantert, Kim, and Ramanan. 2017. The Annals of Probability.
Guédon. 2014. Edited by Arnaud Guillin. ESAIM: Proceedings.
Hall, and Li. 1993. The Annals of Statistics.
Har-Peled, Indyk, and Motwani. 2012. Theory of Computing.
Houdré, Ledoux, and Milman. 2011. Concentration, Functional Inequalities and Isoperimetry: International Workshop on Concentration, Functional Inequalities, and Isoperimetry, October 29-November 1, 2009, Florida Atlantic University, Boca Raton, Florida.
Indyk, and Motwani. 1998. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. STOC ’98.
Jiang, Lee, and Vempala. 2020. In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2017-2019 Volume II. Lecture Notes in Mathematics.
Joarder, and Ali. 1996. Journal of Information and Optimization Sciences.
Kar, and Karnick. 2012. In Artificial Intelligence and Statistics.
Kim, Liao, and Ramanan. 2020. arXiv:1912.13447 [Math].
Klartag. 2007a. Inventiones Mathematicae.
———. 2007b. Journal of Functional Analysis.
Lahiri, Gao, and Ganguli. 2016. arXiv:1607.04331 [Cs, q-Bio, Stat].
Li, Hastie, and Church. 2006. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’06.
Meckes, Elizabeth. 2006. “An Infinitesimal Version of Stein’s Method of Exchangeable Pairs.”
Meckes, Mark W. 2008. arXiv:math/0606073.
Meckes, Elizabeth. 2009. In High Dimensional Probability V: The Luminy Volume.
———. 2012a. In Geometric Aspects of Functional Analysis: Israel Seminar 2006–2010. Lecture Notes in Mathematics.
———. 2012b. Journal of Theoretical Probability.
Meckes, Elizabeth S., and Meckes. 2013. Probability Theory and Related Fields.
Paouris. 2006. “Concentration of Mass on Convex Bodies.”
Peña, Lai, and Shao. 2008. Self-Normalized Processes: Limit Theory and Statistical Applications.
Reeves. 2017. In 2017 IEEE International Symposium on Information Theory (ISIT).
Reinert, and Röllin. 2007.
Rudelson, and Vershynin. 2013. Electronic Communications in Probability.
Saul. 2023. Transactions on Machine Learning Research.
Sodin. 2007. In Geometric Aspects of Functional Analysis.
Stam. 1982. Journal of Applied Probability.
Stein. 1972. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory.
Vershynin. 2011. arXiv:1011.3027 [Cs, Math].
———. 2015. In Sampling Theory, a Renaissance: Compressive Sensing and Other Developments. Applied and Numerical Harmonic Analysis.
von Weizsäcker. 1997. Probability Theory and Related Fields.
Wang, Li, Khabsa, et al. 2020.