Since Gaussian approximations pop up a lot in e.g. variational approximation problems, it is nice to know how various probability metrics come out for them.

## Wasserstein

Usefully therre is an analytic result in Wasserstein-2 distance, i.e. \(W_2(\mu;\nu):=\inf\mathbb{E}(\Vert X-Y\Vert_2^2)^{1/2}\) for \(X\sim\nu\), \(Y\sim\mu\). Two Gaussians may be related thusly: \[\begin{aligned} d&:= W_2(\mathcal{N}(\mu_1,\Sigma_1);\mathcal{N}(\mu_2,\Sigma_2))\\ \Rightarrow d^2&= \Vert \mu_1-\mu_2\Vert_2^2 + \operatorname{tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \end{aligned}\]

In the centred case this is simply (Givens and Shortt 1984) \[\begin{aligned} d&:= W_2(\mathcal{N}(0,\Sigma_1);\mathcal{N}(0,\Sigma_2))\\ \Rightarrow d^2&= \operatorname{tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \end{aligned}\]

## Kullback-Leibler

Pulled from wikipedia:

\[ D_{\text{KL}}(\mathcal{N}(\mu_1,\Sigma_1)\parallel \mathcal{N}(\mu_2,\Sigma_2)) ={\frac {1}{2}}\left(\operatorname {tr} \left(\Sigma _{2}^{-1}\Sigma _{1}\right)+(\mu_{2}-\mu_{1})^{\mathsf {T}}\Sigma _{2}^{-1}(\mu_{2}-\mu_{1})-k+\ln \left({\frac {\det \Sigma _{2}}{\det \Sigma _{1}}}\right)\right).\]

In the centred case this reduces to

\[ D_{\text{KL}}(\mathcal{N}(0,\Sigma_1)\parallel \mathcal{N}(0, \Sigma_2)) ={\frac {1}{2}}\left(\operatorname{tr} \left(\Sigma _{2}^{-1}\Sigma _{1}\right)-k+\ln \left({\frac {\det \Sigma _{2}}{\det \Sigma _{1}}}\right)\right).\]

## Hellinger

Djalil
defines Hellinger *distance*
\[\mathrm{H}(\mu,\nu) ={\Vert\sqrt{f}-\sqrt{g}\Vert}_{\mathrm{L}^2(\lambda)} =\Bigr(\int(\sqrt{f}-\sqrt{g})^2\mathrm{d}\lambda\Bigr)^{1/2}.\]
via Hellinger *affinity*
\[\mathrm{A}(\mu,\nu) =\int\sqrt{fg}\mathrm{d}\lambda, \quad \mathrm{H}(\mu,\nu)^2 =2-2A(\mu,\nu).\]
For Gaussians it apparently turns out that
\[\mathrm{A}(\mathcal{N}(m_1,\sigma_1^2),\mathcal{N}(m_2,\sigma_2^2)) =\sqrt{2\frac{\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \exp\Bigr(-\frac{(m_1-m_2)^2}{4(\sigma_1^2+\sigma_2^2)}\Bigr),\]

In multiple dimensions: \[\mathrm{A}(\mathcal{N}(m_1,\Sigma_1),\mathcal{N}(m_2,\Sigma_2)) =\frac{\det(\Sigma_1\Sigma_2)^{1/4}}{\det(\frac{\Sigma_1+\Sigma_2}{2})^{1/2}} \exp\Bigr(-\frac{\langle\Delta m,(\Sigma_1+\Sigma_2)^{-1}\Delta m)\rangle}{4}\Bigr).\]

## References

*The Michigan Mathematical Journal*31 (2): 231β40.

*Matrix differential calculus with applications in statistics and econometrics*. 3rd ed. Wiley series in probability and statistics. Hoboken (N.J.): Wiley.

*High Dimensional Probability V: The Luminy Volume*, 153β78. Beachwood, Ohio, USA: Institute of Mathematical Statistics.

*Old and new matrix algebra useful for statistics*.

## No comments yet. Why not leave one?