The intersection of reproducing kernel methods, dependence tests and probability metrics; where you use a clever RKHS embedding to measure differences between probability distributions.

A mere placeholder for now.

This abstract by Zoltán Szabó might serve to highlight some keywords.

Maximum mean discrepancy (MMD) and Hilbert-Schmidt independence criterion (HSIC) are among the most popular and successful approaches in applied mathematics to measure the difference and the independence of random variables, respectively. Thanks to their kernel-based foundations, MMD and HSIC are applicable on a large variety of domains such as documents, images, trees, graphs, time series, dynamical systems, sets or permutations. Despite their tremendous practical success, quite little is known about when HSIC characterizes independence and MMD with tensor kernel can discriminate probability distributions, in terms of the contributing kernel components. In this talk, I am going to provide a complete answer to this question, with conditions which are often easy to verify in practice.

Joint work with Bharath K. Sriperumbudur (PSU).

Gaël Varoquaux has a friednly illustrated introduction, Comparing distributions: Kernels estimate good representations, l1 distances give good tests based on (scetbon and Varoquaux 2019).

See the ITE toolbox (estimators).

Gretton, Arthur, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, and Alexander J Smola. 2008. “A Kernel Statistical Test of Independence.” In *Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference*. Cambridge, MA: MIT Press. http://eprints.pascal-network.org/archive/00004335/.

Muandet, Krikamol, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, and Bernhard Schölkopf. 2014. “Kernel Mean Shrinkage Estimators,” May. http://arxiv.org/abs/1405.5505.

Muandet, Krikamol, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. 2017. “Kernel Mean Embedding of Distributions: A Review and Beyond.” *Foundations and Trends® in Machine Learning* 10 (1-2): 1–141. https://doi.org/10.1561/2200000060.

Reid, Mark D., and Robert C. Williamson. 2011. “Information, Divergence and Risk for Binary Experiments.” *Journal of Machine Learning Research* 12 (Mar): 731–817. http://www.jmlr.org/papers/v12/reid11a.html.

———. 2009. “Generalised Pinsker Inequalities.” In. http://arxiv.org/abs/0906.1244.

Rustamov, Raif M. 2019. “Closed-Form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto-Encoders,” January. http://arxiv.org/abs/1901.03227.

scetbon, meyer, and Gael Varoquaux. 2019. “Comparing Distributions: $\ell_1$ Geometry Improves Kernel Two-Sample Testing.” In *Advances in Neural Information Processing Systems 32*, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, 12306–16. Curran Associates, Inc. http://papers.nips.cc/paper/9398-comparing-distributions-ell_1-geometry-improves-kernel-two-sample-testing.pdf.

Schölkopf, Bernhard, Krikamol Muandet, Kenji Fukumizu, and Jonas Peters. 2015. “Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations,” January. http://arxiv.org/abs/1501.06794.

Sejdinovic, Dino, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2012. “Equivalence of Distance-Based and RKHS-Based Statistics in Hypothesis Testing.” *The Annals of Statistics* 41 (5): 2263–91. https://doi.org/10.1214/13-AOS1140.

Smola, Alex, Arthur Gretton, Le Song, and Bernhard Schölkopf. 2007. “A Hilbert Space Embedding for Distributions.” In *Algorithmic Learning Theory*, edited by Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto, 13–31. Lecture Notes in Computer Science 4754. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/978-3-540-75225-7_5.

Song, Le, Jonathan Huang, Alex Smola, and Kenji Fukumizu. 2009. “Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems.” In *Proceedings of the 26th Annual International Conference on Machine Learning*, 961–68. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553497.

Sriperumbudur, Bharath K., Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2012. “On the Empirical Estimation of Integral Probability Metrics.” *Electronic Journal of Statistics* 6: 1550–99. https://doi.org/10.1214/12-EJS722.

Sriperumbudur, Bharath K., Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2010. “Hilbert Space Embeddings and Metrics on Probability Measures.” *Journal of Machine Learning Research* 11 (April): 1517−1561. http://jmlr.csail.mit.edu/papers/v11/sriperumbudur10a.html.

Sriperumbudur, B. K., A. Gretton, K. Fukumizu, G. Lanckriet, and B. Schölkopf. 2008. “Injective Hilbert Space Embeddings of Probability Measures.” In *Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)*. http://eprints.pascal-network.org/archive/00004340/.

Strobl, Eric V., Kun Zhang, and Shyam Visweswaran. 2017. “Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery,” February. http://arxiv.org/abs/1702.03877.

Szabo, Zoltan, and Bharath K. Sriperumbudur. 2017. “Characteristic and Universal Tensor Product Kernels,” August. http://arxiv.org/abs/1708.08157.

Tolstikhin, Ilya O, Bharath K. Sriperumbudur, and Bernhard Schölkopf. 2016. “Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels.” In *Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1930–8. Curran Associates, Inc. http://papers.nips.cc/paper/6483-minimax-estimation-of-maximum-mean-discrepancy-with-radial-kernels.pdf.

Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. “Kernel-Based Conditional Independence Test and Application in Causal Discovery,” February. http://arxiv.org/abs/1202.3775.

Zhang, Qinyi, Sarah Filippi, Arthur Gretton, and Dino Sejdinovic. 2016. “Large-Scale Kernel Methods for Independence Testing,” June. http://arxiv.org/abs/1606.07892.