# Maximum Mean Discrepancy

August 21, 2016 — May 3, 2023

An Integral probability metric. The intersection of reproducing kernel methods, dependence tests and probability metrics; where we use an kernel embedding to cleverly measure differences between probability distributions, typically an RKHS embedding.

Can be estimated from samples only, which is neat.

A mere placeholder. For a thorough treatment see the canonical references (Gretton et al. 2008; Gretton, Borgwardt, et al. 2012).

## 1 Tutorial

Arthur Gretton, Dougal Sutherland, Wittawat Jitkrittum presentation: Interpretable Comparison of Distributions and Models.

Danica Sutherland’s explanation is IMO magnificent.

Pierre Alquier’s post Universal estimation with Maximum Mean Discrepancy (MMD) shows how to use MMD in a robust nonparametric estimator.

Gaël Varoquaux’ introduction is friendly and illustrated, Comparing distributions: Kernels estimate good representations, l1 distances give good tests based on (Scetbon and Varoquaux 2019).

## 2 Connection to Optimal transport losses

Husain (2020)’s results connect IPMs to transport metrics and regularisation theory, and classification.

Feydy et al. (2019) connects MMD to optimal transport losses.

Arbel et al. (2019) also looks pertinent and has some connections to Wasserstein gradient flows.

## 3 Choice of kernel

Hmm. See Gretton, Sriperumbudur, et al. (2012).

## 4 Tooling

MMD is included in the ITE toolbox (estimators).

### 4.1 GeomLoss

The

GeomLosslibrary provides efficient GPU implementations for:

- Kernel norms (also known as Maximum Mean Discrepancies).
- Hausdorff divergences, which are positive definite generalizations of the Chamfer-ICP loss and are analogous to
log-likelihoodsof Gaussian Mixture Models.- Debiased Sinkhorn divergences, which are affordable yet
positive and definiteapproximations of Optimal Transport (Wasserstein) distances.It is hosted on GitHub and distributed under the permissive MIT license. pypi pepy

GeomLoss functions are available through the custom PyTorch layers

`SamplesLoss`

,`ImagesLoss`

and`VolumesLoss`

which allow you to work with weightedpoint clouds(of any dimension),density mapsandvolumetric segmentation masks.

## 5 Incoming

## 6 References

*Proceedings of the 33rd International Conference on Neural Information Processing Systems*.

*arXiv:1704.01376 [Math]*.

*arXiv:1802.04885 [Stat]*.

*arXiv:2202.04744 [Cs, Stat]*.

*Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics*.

*The Journal of Machine Learning Research*.

*Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference*.

*Proceedings of the 25th International Conference on Neural Information Processing Systems*. NIPS’12.

*Physica D: Nonlinear Phenomena*.

*arXiv:2006.04349 [Cs, Stat]*.

*Advances in Neural Information Processing Systems*.

*Proceedings of the 32nd International Conference on Machine Learning*.

*arXiv:1405.5505 [Cs, Stat]*.

*Foundations and Trends® in Machine Learning*.

*The Journal of Machine Learning Research*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*Stat*.

*Advances in Neural Information Processing Systems 32*.

*arXiv:1501.06794 [Cs, Stat]*.

*The Annals of Statistics*.

*Algorithmic Learning Theory*. Lecture Notes in Computer Science 4754.

*IEEE Signal Processing Magazine*.

*Proceedings of the 26th Annual International Conference on Machine Learning*. ICML ’09.

*Electronic Journal of Statistics*.

*Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)*.

*Journal of Machine Learning Research*.

*arXiv:1702.03877 [Stat]*.

*arXiv:1708.08157 [Cs, Math, Stat]*.

*Advances in Neural Information Processing Systems 29*.

*arXiv:1606.07892 [Stat]*.

*arXiv:1202.3775 [Cs, Stat]*.