# Optimal transport metrics

## Wasserstein distances, Monge-Kantorovich metrics, Earthmover distances

danger: I am half way through editing this! notation inconsistencies and indeed plain falsehoods abound.

I presume there are other uses for optimal transport distances apart from as probability metrics, but so far I only care about them in that context, so this will be skewed that way.

Let $$(M,d)$$ be a metric space for which every probability measure on $$M$$ is a Radon measure. For $$p\ge 1$$, let $$\mathcal{P}_p(M)$$ denote the collection of all probability measures $$P$$ on $$M$$ with finite $$p^{\text{th}}$$ moment for some $$x_0$$ in $$M$$,

$\int_{M} d(x, x_{0})^{p} \, \mathrm{d} P (x) < +\infty.$

Then the $$p^{\text{th}}$$ Wasserstein distance between two probability measures $$P$$ and $$Q$$ in $$\mathcal{P}_p(M)$$ is defined as

$W_{p} ( P, Q ):= \left( \inf_{\gamma \in \Pi ( P, Q )} \int_{M \times M} d(x, y)^{p} \, \mathrm{d} \gamma (x, y) \right)^{1/p},$ where $$\Pi( P, Q )$$ denotes the collection of all measures on $$M \times M$$ with marginal distributions $$P$$ and $$Q$$ respectively.

We can work equivalently with RVs distributed according to these measures, respectively $$X\sim P,Y\sim Q,$$ and then we consider $$\gamma$$ to range over joint distributions for these RVs, so that $W_p(X,Y):=W_p(P;Q):=\inf_{\gamma}\mathbb{E}[d(X,Y)^p]^{1/p}$

Also, the cost is usually just the Euclidean/$$L_2$$ distance, so that \begin{aligned} W_p(X,Y):=W_p(P;Q) &:=\inf_{\gamma}\mathbb{E}[\Vert X-Y\Vert_2^p]^{1/p}. \end{aligned}

Practically, one usually sees $$p\in\{1,2\}$$. Maybe $$p=\infty$$ is interesting also . For $$p=1$$ we have an extremely useful representation in terms of optimisation over functions, the Kontorovich-Rubinstein duality. $W_1(P, Q)=\sup _{f,g:f(x)+g(y)\leq d(x,y)} \mathbb{E}_{x \sim P}[f(x)]-\mathbb{E}_{x \sim Q}[g(x)].$ For Euclidean distance this optimisation ranges over 1-Lipschitz functions, $W_1(P, Q)=\sup _{\|f\|_{L} \leq 1} \mathbb{E}_{x \sim P}[f(x)]-\mathbb{E}_{x \sim Q}[f(x)].$ TODO: check that.

The Wasserstein distance between two objects is frequently intractable, or at least has no closed form, but you can find it for some useful special cases, or bound/approximate it in others.

🏗 discuss favourable properties of this metric (triangle inequality, bounds on moments etc).

But why do we are about such an intractable distance? Because it gives good error bounds for sufficiently nice expectations. We know that if $$W_p(X,Y) \leq \epsilon$$, then for any L-Lipschitz function $$f$$, $$|f(X) - f(Y)| \leq L\epsilon.$$

## Analytic expressions

### Gaussian

Useful: Two Gaussians may be related thusly for a Wasserstein-2 for $$X\sim\nu$$, $$Y\sim\mu$$.

\begin{aligned} d&:= W_2(\mathcal{N}(m_1,\Sigma_1);\mathcal{N}(m_2,\Sigma_2))\\ \Rightarrow d^2&= \Vert m_1-m_2\Vert_2^2 + \mathrm{Tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \end{aligned}

## Kontorovich-Rubinstein duality

Vincent Hermann gives an excellent practical introduction.

## “Neural Net distance”

Wasserstein distance with a baked in notion of the capacity of the function class which approximate the true Wasserstein. Is this actually used?

## Fisher distance

Specifically $$(p,\nu)$$-Fisher distances, in the terminology of . They use these distances as a computationally tractable proxy (in fact, bound) for Wasserstein distances during inference. See Fisher distances.

## Sinkhorn divergence

A regularised version of a Wasserstein metric.

$W_{p,\eta} ( P, Q )^p:= \inf_{\gamma \in \Pi ( P, Q )} \int_{M \times M} d(x, y)^{p} \, \mathrm{d} \gamma (x, y) -H(\gamma).$

Here $$H$$ is the entropy.

The examples I saw seem to be only applied to measures over finite sets (i.e. histograms, weighted point sets), where there are many neat tricks to make calculations tractable. Can it be more general?

TBD.

## Introductions

have been recommended to me as compact modern introductions.

Peyré’s course to accompany Peyré and Cuturi (2019) has been recommended to me and comes with course notes.

Or the bibliography in POT: Python Optimal Transport.

## Incoming

Michele Coscia’s paper using a graph Laplacian to calculate an approximate Earth mover distance over a graph topology. Buzzword use case: interpretably inferring graph transmission rate of a disease. This looks simple; surely it must be a known result in optimal transport metric studies?

## References

Altschuler, Jason, Francis Bach, Alessandro Rudi, and Jonathan Niles-Weed. 2019. In Advances in Neural Information Processing Systems 32, 11. Curran Associates, Inc.
Ambrosio, Luigi, and Nicola Gigli. 2013. In Modelling and Optimisation of Flows on Networks: Cetraro, Italy 2009, Editors: Benedetto Piccoli, Michel Rascle, edited by Luigi Ambrosio, Alberto Bressan, Dirk Helbing, Axel Klar, and Enrique Zuazua, 1–155. Lecture Notes in Mathematics. Berlin, Heidelberg: Springer.
Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savare. 2008. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. 2nd ed. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel.
Angenent, Sigurd, Steven Haker, and Allen Tannenbaum. 2003. SIAM Journal on Mathematical Analysis 35 (1): 61–97.
Arjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017. In International Conference on Machine Learning, 214–23.
Arora, Sanjeev, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. 2017. arXiv:1703.00573 [Cs], March.
Arras, Benjamin, Ehsan Azmoodeh, Guillaume Poly, and Yvik Swan. 2017. arXiv:1704.01376 [Math], April.
Bachem, Olivier, Mario Lucic, and Andreas Krause. 2017. arXiv Preprint arXiv:1703.06476.
Bion-Nadal, Jocelyne, and Denis Talay. 2019. The Annals of Applied Probability 29 (3).
Blanchet, Jose, Lin Chen, and Xun Yu Zhou. 2018. arXiv:1802.04885 [Stat], February.
Blanchet, Jose, Arun Jambulapati, Carson Kent, and Aaron Sidford. 2018. arXiv:1810.07717 [Cs], October.
Blanchet, Jose, Yang Kang, and Karthyek Murthy. 2016. arXiv:1610.05627 [Math, Stat], October.
Blanchet, Jose, Karthyek Murthy, and Nian Si. 2019. arXiv:1906.01614 [Math, Stat], June.
Blanchet, Jose, Karthyek Murthy, and Fan Zhang. 2018. arXiv:1810.02403 [Math], October.
Bolley, François, Ivan Gentil, and Arnaud Guillin. 2012. Journal of Functional Analysis 263 (8): 2430–57.
Bolley, François, and Cédric Villani. 2005. Annales de La Faculté Des Sciences de Toulouse Mathématiques 14 (3): 331–52.
Canas, Guillermo D., and Lorenzo Rosasco. 2012. arXiv:1209.1077 [Cs, Stat], September.
Carlier, Guillaume, Marco Cuturi, Brendan Pass, and Carola Schoenlieb. 2017. “Optimal Transport Meets Probability, Statistics and Machine Learning,” 9.
Champion, T., L. De Pascale, and P. Juutinen. 2008. SIAM Journal on Mathematical Analysis 40 (1): 1–20.
Chizat, Lénaïc, and Francis Bach. 2018. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3040–50. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Chizat, Lénaïc, Edouard Oyallon, and Francis Bach. 2018. arXiv:1812.07956 [Cs, Math], December.
Chu, Casey, Jose Blanchet, and Peter Glynn. 2019. In ICML.
Coscia, Michele. 2020. “Generalized Euclidean Measure to Estimate Network Distances,” 11.
Cuturi, Marco. 2013. In Advances in Neural Information Processing Systems 26.
Dowson, D. C., and B. V. Landau. 1982. Journal of Multivariate Analysis 12 (3): 450–55.
Dudley, R. M. 2002. Real Analysis and Probability. Cambridge Studies in Advanced Mathematics 74. Cambridge ; New York: Cambridge University Press.
Fernholz, Luisa Turrin. 1983. von Mises calculus for statistical functionals. Lecture Notes in Statistics 19. New York: Springer.
Frogner, Charlie, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. 2015. In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2053–61. Curran Associates, Inc.
Gao, Rui, and Anton J. Kleywegt. 2022. arXiv.
Givens, Clark R., and Rae Michael Shortt. 1984. The Michigan Mathematical Journal 31 (2): 231–40.
Gozlan, Nathael, and Christian Léonard. 2010. arXiv:1003.3852 [Math], March.
Guo, Xin, Johnny Hong, Tianyi Lin, and Nan Yang. 2017. arXiv:1705.07164 [Cs, Stat], May.
Henry-Labordère, Pierre. 2017. Model-Free Hedging: A Martingale Optimal Transport Viewpoint. Chapman and Hall CRC Financial Mathematics 1.0. CRC Press.
Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018a. arXiv:1806.10234 [Cs, Stat], June.
———. 2018b. arXiv:1809.09505 [Cs, Math, Stat], September.
Huggins, Jonathan, Ryan P Adams, and Tamara Broderick. 2017. In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 3611–21. Curran Associates, Inc.
Ji, Kaiyi, and Yingbin Liang. 2018. November.
Khan, Gabriel, and Jun Zhang. 2022. Information Geometry, June.
Kloeckner, Benoît R. 2021. Mathematical Proceedings of the Cambridge Philosophical Society, June, 1–25.
Knott, M., and C. S. Smith. 1984. Journal of Optimization Theory and Applications 43 (1): 39–49.
Kontorovich, Aryeh, and Maxim Raginsky. 2016. arXiv:1602.00721 [Cs, Math], February.
Léonard, Christian. 2014. Discrete & Continuous Dynamical Systems - A 34 (4): 1533.
Liu, Huidong, Xianfeng Gu, and Dimitris Samaras. 2018. In International Conference on Machine Learning, 3159–68.
Liu, Qiang, and Dilin Wang. 2019. In Advances In Neural Information Processing Systems.
Mahdian, Saied, Jose Blanchet, and Peter Glynn. 2019. arXiv:1906.03317 [Cs, Math, Stat], June.
Mallasto, Anton, Augusto Gerolin, and Hà Quang Minh. 2021. Information Geometry, August.
Marzouk, Youssef, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016. In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–41. Cham: Springer International Publishing.
Maurya, Abhinav. 2018. “Optimal Transport in Statistical Machine Learning : Selected Review and Some Open Questions.” In.
Mohajerin Esfahani, Peyman, and Daniel Kuhn. 2018. Mathematical Programming 171 (1): 115–66.
Montavon, Grégoire, Klaus-Robert Müller, and Marco Cuturi. 2016. In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3711–19. Curran Associates, Inc.
Olkin, I., and F. Pukelsheim. 1982. Linear Algebra and Its Applications 48: 257–63.
Panaretos, Victor M., and Yoav Zemel. 2019. Annual Review of Statistics and Its Application 6 (1): 405–31.
———. 2020. An Invitation to Statistics in Wasserstein Space. SpringerBriefs in Probability and Mathematical Statistics. Springer International Publishing.
Peyre, Gabriel. 2019. “Course Notes on Computational Optimal Transport,” 46.
Peyré, Gabriel, and Marco Cuturi. 2019. Computational Optimal Transport. Vol. 11.
Piccoli, Benedetto, and Francesco Rossi. 2016. Archive for Rational Mechanics and Analysis 222 (3): 1339–65.
Rustamov, Raif M. 2019. arXiv:1901.03227 [Cs, Stat], January.
Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Edited by Filippo Santambrogio. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing.
Singh, Shashank, and Barnabás Póczos. 2018. arXiv:1802.08855 [Cs, Math, Stat], February.
Solomon, Justin, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. 2015. ACM Transactions on Graphics 34 (4): 66:1–11.
Spantini, Alessio, Daniele Bigoni, and Youssef Marzouk. 2017. Journal of Machine Learning Research 19 (66): 2639–709.
Takatsu, Asuka. 2008. January.
Thorpe, Matthew. 2018. “Introduction to Optimal Transport,” 56.
Verdinelli, Isabella, and Larry Wasserman. 2019. Electronic Journal of Statistics 13 (2): 5088–5119.
Villani, Cédric. 2009. Optimal Transport: Old and New. Grundlehren Der Mathematischen Wissenschaften. Berlin Heidelberg: Springer-Verlag.
Weed, Jonathan, and Francis Bach. 2017. arXiv:1707.00087 [Math, Stat], June.
Zemel, Yoav. 2012. “Optimal Transportation: Continuous and Discrete.”
Zhu, B., J. Jiao, and D. Tse. 2020. IEEE Transactions on Information Theory 66 (11): 7155–79.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.