The intersection of reproducing kernel methods, dependence tests and probability metrics; where you use a clever RKHS embedding to measure differences between probability distributions.

A mere placeholder for now.

This abstract by Zoltán Szabó might serve to highlight some keywords.

Maximum mean discrepancy (MMD) and Hilbert-Schmidt independence criterion (HSIC) are among the most popular and successful approaches in applied mathematics to measure the difference and the independence of random variables, respectively. Thanks to their kernel-based foundations, MMD and HSIC are applicable on a large variety of domains such as documents, images, trees, graphs, time series, dynamical systems, sets or permutations. Despite their tremendous practical success, quite little is known about when HSIC characterizes independence and MMD with tensor kernel can discriminate probability distributions, in terms of the contributing kernel components. In this talk, I am going to provide a complete answer to this question, with conditions which are often easy to verify in practice.

Joint work with Bharath K. Sriperumbudur (PSU).

Gaël Varoquaux has a friednly illustrated introduction, Comparing distributions: Kernels estimate good representations, l1 distances give good tests based on (scetbon and Varoquaux 2019).

See the ITE toolbox (estimators).

## References

*Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference*. Cambridge, MA: MIT Press. http://eprints.pascal-network.org/archive/00004335/.

*Foundations and Trends® in Machine Learning*10 (1-2): 1–141. https://doi.org/10.1561/2200000060.

*Journal of Machine Learning Research*12: 731–817. http://www.jmlr.org/papers/v12/reid11a.html.

*Advances in Neural Information Processing Systems 32*, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, 12306–16. Curran Associates, Inc. http://papers.nips.cc/paper/9398-comparing-distributions-ell_1-geometry-improves-kernel-two-sample-testing.pdf.

*The Annals of Statistics*41 (5): 2263–91. https://doi.org/10.1214/13-AOS1140.

*Algorithmic Learning Theory*, edited by Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto, 13–31. Lecture Notes in Computer Science 4754. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/978-3-540-75225-7_5.

*Proceedings of the 26th Annual International Conference on Machine Learning*, 961–68. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553497.

*Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)*. http://eprints.pascal-network.org/archive/00004340/.

*Electronic Journal of Statistics*6: 1550–99. https://doi.org/10.1214/12-EJS722.

*Journal of Machine Learning Research*11 (April): 1517−1561. http://jmlr.csail.mit.edu/papers/v11/sriperumbudur10a.html.

*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1930–38. Curran Associates, Inc. http://papers.nips.cc/paper/6483-minimax-estimation-of-maximum-mean-discrepancy-with-radial-kernels.pdf.