# Distances between Gaussian distributions

Since Gaussian approximations pop up a lot in e.g. variational approximation problems, it is nice to know how various probability metrics come out for them.

## Wasserstein

Usefully therre is an analytic result in Wasserstein-2 distance, i.e. $$W_2(\mu;\nu):=\inf\mathbb{E}(\Vert X-Y\Vert_2^2)^{1/2}$$ for $$X\sim\nu$$, $$Y\sim\mu$$. Two Gaussians may be related thusly: \begin{aligned} d&:= W_2(\mathcal{N}(\mu_1,\Sigma_1);\mathcal{N}(\mu_2,\Sigma_2))\\ \Rightarrow d^2&= \Vert \mu_1-\mu_2\Vert_2^2 + \operatorname{tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \end{aligned}

In the centred case this is simply \begin{aligned} d&:= W_2(\mathcal{N}(0,\Sigma_1);\mathcal{N}(0,\Sigma_2))\\ \Rightarrow d^2&= \operatorname{tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \end{aligned}

## Kullback-Leibler

Pulled from wikipedia:

$D_{\text{KL}}(\mathcal{N}(\mu_1,\Sigma_1)\parallel \mathcal{N}(\mu_2,\Sigma_2)) ={\frac {1}{2}}\left(\operatorname {tr} \left(\Sigma _{2}^{-1}\Sigma _{1}\right)+(\mu_{2}-\mu_{1})^{\mathsf {T}}\Sigma _{2}^{-1}(\mu_{2}-\mu_{1})-k+\ln \left({\frac {\det \Sigma _{2}}{\det \Sigma _{1}}}\right)\right).$

In the centred case this reduces to

$D_{\text{KL}}(\mathcal{N}(0,\Sigma_1)\parallel \mathcal{N}(0, \Sigma_2)) ={\frac {1}{2}}\left(\operatorname{tr} \left(\Sigma _{2}^{-1}\Sigma _{1}\right)-k+\ln \left({\frac {\det \Sigma _{2}}{\det \Sigma _{1}}}\right)\right).$

## Hellinger

Djalil defines Hellinger distance $\mathrm{H}(\mu,\nu) ={\Vert\sqrt{f}-\sqrt{g}\Vert}_{\mathrm{L}^2(\lambda)} =\Bigr(\int(\sqrt{f}-\sqrt{g})^2\mathrm{d}\lambda\Bigr)^{1/2}.$ via Hellinger affinity $\mathrm{A}(\mu,\nu) =\int\sqrt{fg}\mathrm{d}\lambda, \quad \mathrm{H}(\mu,\nu)^2 =2-2A(\mu,\nu).$ For Gaussians it apparently turns out that $\mathrm{A}(\mathcal{N}(m_1,\sigma_1^2),\mathcal{N}(m_2,\sigma_2^2)) =\sqrt{2\frac{\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \exp\Bigr(-\frac{(m_1-m_2)^2}{4(\sigma_1^2+\sigma_2^2)}\Bigr),$

In multiple dimensions: $\mathrm{A}(\mathcal{N}(m_1,\Sigma_1),\mathcal{N}(m_2,\Sigma_2)) =\frac{\det(\Sigma_1\Sigma_2)^{1/4}}{\det(\frac{\Sigma_1+\Sigma_2}{2})^{1/2}} \exp\Bigr(-\frac{\langle\Delta m,(\Sigma_1+\Sigma_2)^{-1}\Delta m)\rangle}{4}\Bigr).$

## References

Givens, Clark R., and Rae Michael Shortt. 1984. The Michigan Mathematical Journal 31 (2): 231–40.
Magnus, Jan R., and Heinz Neudecker. 2019. Matrix differential calculus with applications in statistics and econometrics. 3rd ed. Wiley series in probability and statistics. Hoboken (N.J.): Wiley.
Meckes, Elizabeth. 2009. In High Dimensional Probability V: The Luminy Volume, 153–78. Beachwood, Ohio, USA: Institute of Mathematical Statistics.
Minka, Thomas P. 2000. Old and new matrix algebra useful for statistics.
Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012.
Takatsu, Asuka. 2008. January.
Zhang, Yufeng, Wanwei Liu, Zhenbang Chen, Ji Wang, and Kenli Li. 2022. arXiv.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.