# Stein’s method

The importance of possessive apostrophes

A famous generic method for approximating distributions and quantifying discrepancy and manufacturing concentration bounds and limit theorems is Stein’s method, typically in the form of the method of exchangeable pairs . Wikipedia will do as a rough intro for now, although their info is rather out-of-date. There does not seem to be a thorough introduction to all the modern and useful tools. Different bits are introduced in Barbour and Chen (2005); Chatterjee (2014); Meckes (2012); Ross (2011).

Chen Soon Ong says:

If we have a distribution P, and we want to measure the distance to P from another distribution Q (which we control), an interesting trick to measure this distance is to define an operator T. This operator T, called the Stein operator, allows us to measure the distance between distributions by considering the distance between test functions on the random variables corresponding to P and Q. This is a more general structure than integral probability metrics, which in turn is a more general version of Wasserstein distance.

Probably the best intro is Lily Li’s Whirlwind Tour.

## Stein operators

### Gaussian

The original form, Stein’s lemma gives use the Stein operator for the Gaussian distribution in particular. Meckes (2009) explains:

The normal distribution is the unique probability measure $$\mu$$ for which $\int\left[f^{\prime}(x)-x f(x)\right] \mu(d x)=0$ for all $$f$$ for which the left-hand side exists and is finite. It is useful to think of this in terms of operators, specifically, the operator $$\mathcal{A}_{o}$$ defined on $$C^{1}$$ functions by $\mathcal{A}_{o} f(x)=f^{\prime}(x)-x f(x)$ is called the characterizing operator of the standard normal distribution.
This is incredibly useful in probability approximation by Gaussians where it justifies Stein’s method, below. It has apparently been extended to elliptical distributions and exponential families.

Multivariate? Why, yes please. The following lemma of Meckes (2006) gives a second-order characterizing operator for the Gaussian distribution on $$\mathbb{R}^{k}$$:

For $$f \in C^{1}\left(\mathbb{R}^{k}\right)$$, define the gradient of $$f$$ by $$\nabla f(x)=\left(\frac{\partial f}{\partial x_{1}}(x), \ldots, \frac{\partial f}{\partial x_{k}}(x)\right)^{t}$$. Define the Laplacian of $$f$$ by $$\Delta f(x)=\sum_{i=1}^{k} \frac{\partial^{2} f}{\partial x_{i}^{2}}(x)$$. Now, let $$Z \sim \mathcal{N}(0_k, \mathrm{I}_k)$$.

1. If $$f: \mathbb{R}^{k} \rightarrow \mathbb{R}$$ is two times continuously differentiable and compactly supported, then $\mathbb{E}[\Delta f(Z)-Z \cdot \nabla f(Z)]=0$
2. If $$Y \in \mathbb{R}^{k}$$ is a random vector such that $\mathbb{E}[\Delta f(Y)-Y \cdot \nabla f(Y)]=0$ for every $$f \in C^{2}\left(\mathbb{R}^{k}\right)$$, then $$\mathcal{L}(Y)=\mathcal{L}(Z) .$$
3. If $$g \in C_{o}^{\infty}\left(\mathbb{R}^{k}\right)$$, then the function $U_{o} g(x):=\int_{0}^{1} \frac{1}{2 t}[\mathbb{E} g(\sqrt{t} x+\sqrt{1-t} Z)-\mathbb{E} g(Z)] d t$ is a solution to the differential equation $\Delta h(x)-x \cdot \nabla h(x)=g(x)-\mathbb{E} g(Z)$

### Poisson

a.k.a. Stein-Chen. $\mathcal{A}_{o} f(k)=\lambda f(k+1)-k f(k)$

### Exponential

Ross (2011) mentions

$\mathcal{A}_{o} f(x)=f^{\prime}(x)-f(x)+f(0)$

### Markov processes

TBD; relation to infinitesimal generators? See perhaps Schoutens (2001).

Face to face, each classic case
We shadow box and double cross
Yet need the chase
A license to love, insurance to hold
Melts all your memories and change into gold
His eyes are like angels but his heart is cold
He’s a Stein operator
Stein operator
Stein operator
Stein operator

## Stein’s method of exchangeable pairs

Meckes (2009) summarises.

Heuristically, the univariate method of exchangeable pairs goes as follows. Let $$W$$ be a random variable conjectured to be approximately Gaussian; assume that $$\mathbb{E} W=0$$ and $$\mathbb{E} W^{2}=1 .$$ From $$W,$$ construct a new random variable $$W^{\prime}$$ such that the pair $$\left(W, W^{\prime}\right)$$ has the same distribution as $$\left(W^{\prime}, W\right) .$$ This is usually done by making a “small random change” in $$W$$, so that $$W$$ and $$W^{\prime}$$ are close. Let $$\Delta=W^{\prime}-W$$. If it can be verified that there is a $$\lambda>0$$ such that \begin{aligned} \mathbb{E}[\Delta \mid W]=-\lambda W+E_{1} \\ \mathbb{E}\left[\Delta^{2} \mid W\right]=2 \lambda+E_{2} \\ \mathbb{E}|\Delta|^{3}=E_{3} \end{aligned} with the random quantities $$E_{1}, E_{2}$$ and the deterministic quantity $$E_{3}$$ being small compared to $$\lambda,$$ then $$W$$ is indeed approximately Gaussian, and its distance to Gaussian (in some metric) can be bounded in terms of the $$E_{i}$$ and $$\lambda$$.

This comes out very nicely where there are natural symmetries to exploit, e.g. in low-d projections.

### Non-Gaussian Stein method

Steins method generalises to AFAICT any exponential distribution. TBD

### Multivariate Gaussian Stein method

The work of Elizabeth Meckes (1980—2020) serves as the canonical introduction in the area for now, although she never wrote a textbook. Two foundational ones are Chatterjee and Meckes (2008) and Meckes (2009) and there is a kind of introductory user guide in Meckes (2012); The examples are mostly about random projections although the method is much more general. The exchangeable pairs are natural in projections though, you can just switch off your brain and turn the handle to produce results, or easier yet, a computer algebra system that can handle noncommutative algebra can do it for you.

If the papers are too dense, try this friendly lecture, Stein’s Method — The last gadget under the hood.

## Stein discrepancy

A probability metric based on something like “how well this distribution satisfies Stein’s lemma” I think?

## References

Agueh, Martial, and Guillaume Carlier. 2011. SIAM Journal on Mathematical Analysis 43 (2): 904–24.
Anastasiou, Andreas, Alessandro Barp, François-Xavier Briol, Bruno Ebner, Robert E. Gaunt, Fatemeh Ghaderinezhad, Jackson Gorham, et al. 2021. arXiv:2105.03481 [Math, Stat], May.
Arras, Benjamin, and Christian Houdré. 2019a. On Stein’s Method for Infinitely Divisible Laws with Finite First Moment. Edited by Benjamin Arras and Christian Houdré. SpringerBriefs in Probability and Mathematical Statistics. Cham: Springer International Publishing.
———. 2019b. Electronic Journal of Probability 24 (none): 1–63.
Barbour, A. D., and Louis H. Y. Chen, eds. 2005. An Introduction to Stein’s Method. Vol. 4. Lecture Notes Series / Institute for Mathematical Sciences, National University of Singapore, v. 4. Singapore : Hackensack, N.J: Singapore University Press ; World Scientific.
Chatterjee, Sourav. 2014. arXiv:1404.1392 [Math], April.
Chatterjee, Sourav, and Elizabeth Meckes. 2008. arXiv:math/0701464, January.
Chen, Louis H. Y. 1998. In Probability Towards 2000, edited by L. Accardi and C. C. Heyde, 97–122. Lecture Notes in Statistics. New York, NY: Springer.
Chwialkowski, Kacper, Heiko Strathmann, and Arthur Gretton. 2016. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 2606–15. ICML’16. New York, NY, USA: JMLR.org.
Cuturi, Marco, and Arnaud Doucet. 2014. In International Conference on Machine Learning, 685–93. PMLR.
Gorham, Jackson, and Lester Mackey. 2015. In Advances in Neural Information Processing Systems. Vol. 28.
Gorham, Jackson, Anant Raj, and Lester Mackey. 2020. arXiv:2007.02857 [Cs, Math, Stat], October.
Landsman, Zinoviy, and Johanna Nešlehová. 2008. Journal of Multivariate Analysis 99 (5): 912–27.
Landsman, Zinoviy, Steven Vanduffel, and Jing Yao. 2013. Journal of Statistical Planning and Inference 143 (11): 2016–22.
Ley, Christophe, Gesine Reinert, and Yvik Swan. 2017. Probability Surveys 14 (none): 1–52.
Liu, Qiang, Jason D. Lee, and Michael I. Jordan. 2016. arXiv:1602.03253 [Stat], July.
Liu, Qiang, and Dilin Wang. 2019. In Advances In Neural Information Processing Systems.
Loh, Wei-Liem. n.d.
Meckes, Elizabeth. 2006. “An Infinitesimal Version of Stein’s Method of Exchangeable Pairs.”
———. 2009. In High Dimensional Probability V: The Luminy Volume, 153–78. Beachwood, Ohio, USA: Institute of Mathematical Statistics.
———. 2012. Journal of Theoretical Probability 25 (2): 333–52.
Paulin, Daniel, Lester Mackey, and Joel A. Tropp. 2016. The Annals of Probability 44 (5): 3431–73.
Piccoli, Benedetto, and Francesco Rossi. 2016. Archive for Rational Mechanics and Analysis 222 (3): 1339–65.
Reinert, Gesine. 2000. In Complex Stochastic Systems, 235–75. Boca Raton: Chapman & Hall/CRC.
———. 2005. In An Introduction to Stein’s Method, Volume 4:183–221. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore, Volume 4. CO-PUBLISHED WITH SINGAPORE UNIVERSITY PRESS.
Reinert, Gesine, and Adrian Röllin. 2007. November.
Ross, Nathan. 2011. Probability Surveys 8 (0): 210–93.
Schoutens, Wim. 2001. Journal of Mathematical Analysis and Applications 253 (2): 515–31.
Stein, Charles. 1972. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, January, 583–602.
———. 1986. Approximate Computation of Expectations. Vol. 7. IMS.
Tan, Yue, Yingdong Lu, and Cathy Xia. 2018. arXiv:1810.04300 [Math], October.
Upadhye, Neelesh S., and Kalyan Barman. 2020. arXiv:2004.07593 [Math], July.
Xu, Wenkai, and Takeru Matsuda. 2020. In International Conference on Artificial Intelligence and Statistics, 320–30. PMLR.
———. 2021. arXiv:2103.00895 [Stat], March.
Xu, Wenkai, and Gesine Reinert. 2021. arXiv:2103.00580 [Stat], February.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.