An Integral probability metric The intersection of reproducing kernel methods, dependence tests and probability metrics; where we use an RKHS embedding to cleverly measure differences between probability distributions.

Can be estimated from samples only, which is neat.

A mere placeholder.

Danica Sutherland’s explanation is IMO magnificent

Husain (2020)’s results connect IPMs to transport metrics and regularisation theory, and classification.

Pierre Alquier’s post Universal estimation with Maximum Mean Discrepancy (MMD) shows how to use MMD in a robust nonparametric estimator.

Gaël Varoquaux’ introduction is friendly and illustrated, Comparing distributions: Kernels estimate good representations, l1 distances give good tests based on (Scetbon and Varoquaux 2019).

MMD is included in ITE toolbox (estimators).

## Choice of kernel

Hmm. See Gretton et al. (2012).

## GeomLoss

The

GeomLosslibrary provides efficient GPU implementations for:

- Kernel norms (also known as Maximum Mean Discrepancies).
- Hausdorff divergences, which are positive definite generalizations of the Chamfer-ICP loss and are analogous to
log-likelihoodsof Gaussian Mixture Models.- Debiased Sinkhorn divergences, which are affordable yet
positive and definiteapproximations of Optimal Transport (Wasserstein) distances.It is hosted on GitHub and distributed under the permissive MIT license.

GeomLoss functions are available through the custom PyTorch layers

`SamplesLoss`

,`ImagesLoss`

and`VolumesLoss`

which allow you to work with weightedpoint clouds(of any dimension),density mapsandvolumetric segmentation masks.

## Connection to Optimal transport losses

Feydy et al. (2019) connects MMD to optimal transport losses.

## References

*arXiv:1704.01376 [Math]*, April.

*arXiv:1802.04885 [Stat]*, February.

*arXiv:2202.04744 [Cs, Stat]*, February.

*Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics*, 2681–90. PMLR.

*Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference*. Cambridge, MA: MIT Press.

*Proceedings of the 25th International Conference on Neural Information Processing Systems*, 1205–13. NIPS’12. Red Hook, NY, USA: Curran Associates Inc.

*Physica D: Nonlinear Phenomena*421 (July): 132817.

*arXiv:2006.04349 [Cs, Stat]*, June.

*Advances in Neural Information Processing Systems*. Vol. 30. Curran Associates, Inc.

*arXiv:1405.5505 [Cs, Stat]*, May.

*Foundations and Trends® in Machine Learning*10 (1-2): 1–141.

*The Journal of Machine Learning Research*17 (1): 6240–67.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*80 (1): 5–31.

*arXiv:1901.03227 [Cs, Stat]*, January.

*Advances in Neural Information Processing Systems 32*, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and R. Garnett, 12306–16. Curran Associates, Inc.

*arXiv:1501.06794 [Cs, Stat]*, January.

*The Annals of Statistics*41 (5): 2263–91.

*Algorithmic Learning Theory*, edited by Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto, 13–31. Lecture Notes in Computer Science 4754. Springer Berlin Heidelberg.

*IEEE Signal Processing Magazine*30 (4): 98–111.

*Proceedings of the 26th Annual International Conference on Machine Learning*, 961–68. ICML ’09. New York, NY, USA: ACM.

*Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)*.

*Electronic Journal of Statistics*6: 1550–99.

*Journal of Machine Learning Research*11 (April): 1517−1561.

*arXiv:1702.03877 [Stat]*, February.

*arXiv:1708.08157 [Cs, Math, Stat]*, August.

*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1930–38. Curran Associates, Inc.

*arXiv:1202.3775 [Cs, Stat]*, February.

*arXiv:1606.07892 [Stat]*, June.

## No comments yet. Why not leave one?