(Reproducing) kernel tricks

WARNING: This is very old. If I were to write it now, I would write it differently, and specifically more pedagocgically

Kernel in the sense of the “kernel trick”. Not to be confused with smoothing-type convolution kernels, nor the dozens of related-but-slightly-different clashing definitions of kernel; those can have their own respective pages. Corollary: If you do not know what to name something, call it a kernel.

We are concerned with a particular flavour of kernel in Hilbert spaces, specifically reproducing or Mercer kernels . The associated function space is a reproducing Kernel Hilbert Space, which is hereafter an RKHS.

Kernel tricks comprise the application of Mercer kernels in Machine Learning. The “trick” part is that many machine learning algorithms operate on inner products. Or can be rewritten to work that way. Such algorithms permit one to swap out a boring classic Euclidean definition of that inner product in favour of a fancy RKHS one. The classic machine learning pitch for trying such a stunt is something like “upgrade your old boring linear algebra on finite (usually low-) dimensional spaces to sexy algebra on potentially-infinite-dimensional feature spaces, which still has a low-dimensional representation.” Or, if you’d like, “apply statistical learning methods based on things with an obvious finite vector space representation ($$\mathbb{R}^n$$) to things without one (Sentences, piano-rolls, $$\mathcal{C}^d_\ell$$).”

Mini history: The oft-cited origins of all the reproducing kernel stuff are . It took a while to percolate into random function theory as covariance functions. Thence the idea arrived in statistical inference and signal processing , and now it is ubiquitous.

Practically, kernel methods have problems with scalability to large data sets. To apply any such method you need to keep a full Gram matrix of inner products between every data point, which needs you to know, for $$N$$ data points, $$N(N-1)/2$$ entries of a symmetric matrix. If you need to invert that matrix the cost is $$\mathcal{O}(N^3)$$, which means you need fancy tricks to handle large $$N$$. Fancy tricks depend on what the actual model is, but include Sparse GPs, random-projection inversions, Markov approximations and presumably many more

I’m especially interested in the application of such tricks in

1. kernel regression
2. wide random NNs
3. Nonparametric kernel independence tests
4. Efficient kernel pre-image approximation
5. Connection between kernel PCA and clustering Turns out not all those applications are interesting to me.

Introductions

Feature space

There are many primers on Mercer kernels and their connection to ML. Kenneth Tay’s intro is punchy. See , which grinds out many connections with learning theory, or , which is more narrowly focussed on just the Mercer-kernel part which emphasises topological and geometric properties of the spaces, or for an approximation-theory perspective which does not especially concern itself with stochastic processes. I also seem to have bookmarked the following introductions .

Alex Smola (who with, Bernhard Schölkopf) has his name on an intimidating proportion of publications in this area, also has all his publications online.

Non-scalar-valued “kernels”

Extending the usual inner-product framing, Operator-valued kernels, , generalise to $$k:\mathcal{X}\times \mathcal{X}\mapsto \mathcal{L}(H_Y)$$, as seen in multi-task learning.

References

Aasnaes, H., and T. Kailath. 1973. IEEE Transactions on Automatic Control 18 (6): 601–7.
Agarwal, Arvind, and Hal Daumé Iii. 2011. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 85–92.
Agrawal, Raj, and Tamara Broderick. 2021. arXiv:2106.12408 [Stat], October.
Agrawal, Raj, Brian Trippe, Jonathan Huggins, and Tamara Broderick. 2019. In Proceedings of the 36th International Conference on Machine Learning, 141–50. PMLR.
Alaoui, Ahmed El, and Michael W. Mahoney. 2014. arXiv:1411.0306 [Cs, Stat], November.
Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.
Álvarez, Mauricio A., Lorenzo Rosasco, and Neil D. Lawrence. 2012. Foundations and Trends® in Machine Learning 4 (3): 195–266.
Aronszajn, N. 1950. Transactions of the American Mathematical Society 68 (3): 337–404.
Azangulov, Iskander, Andrei Smolensky, Alexander Terenin, and Viacheslav Borovitskiy. 2022. arXiv.
Bach, Francis. 2008. In Proceedings of the 21st International Conference on Neural Information Processing Systems, 105–12. NIPS’08. USA: Curran Associates Inc.
———. 2015. arXiv Preprint arXiv:1502.06800.
Bach, Francis R. 2013. In COLT, 30:185–209.
Backurs, Arturs, Piotr Indyk, and Ludwig Schmidt. 2017. arXiv:1704.02958 [Cs, Stat], April.
Bakır, Gökhan H., Alexander Zien, and Koji Tsuda. 2004. In Pattern Recognition, edited by Carl Edward Rasmussen, Heinrich H. Bülthoff, Bernhard Schölkopf, and Martin A. Giese, 253–61. Lecture Notes in Computer Science 3175. Springer Berlin Heidelberg.
Balog, Matej, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, and Yee Whye Teh. 2016. arXiv:1606.05241 [Stat], June.
Ben-Hur, Asa, Cheng Soon Ong, Sören Sonnenburg, Bernhard Schölkopf, and Gunnar Rätsch. 2008. PLoS Comput Biol 4 (10): e1000173.
Bosq, Denis, and Delphine Blanke. 2007. Inference and prediction in large dimensions. Wiley series in probability and statistics. Chichester, England ; Hoboken, NJ: John Wiley/Dunod.
Boyer, Claire, Antonin Chambolle, Yohann De Castro, Vincent Duval, Frédéric De Gournay, and Pierre Weiss. 2018. arXiv:1806.09810 [Cs, Math], June.
Brown, Lawrence D., and Yi Lin. 2004. The Annals of Statistics 32 (4): 1723–43.
Burges, C. J. C. 1998. In Advances in Kernel Methods - Support Vector Learning, edited by Bernhard Schölkopf, Christopher JC Burges, and Alexander J Smola. Cambridge, MA: MIT Press.
Canu, Stéphane, and Alex Smola. 2006. Neurocomputing 69 (7-9): 714–20.
Carrasco, Rafael C., Jose Oncina, and Jorge Calera-Rubio. 2001. Machine Learning 44 (1-2): 185–97.
Chatfield, Ken, Victor Lempitsky, Andrea Vedaldi, and Andrew Zisserman. 2011. November.
Cheney, Elliott Ward, and William Allan Light. 2009. A Course in Approximation Theory. American Mathematical Soc.
Choromanski, Krzysztof, and Vikas Sindhwani. 2016. arXiv:1605.09049 [Cs, Stat], May.
Chwialkowski, Kacper, Heiko Strathmann, and Arthur Gretton. 2016. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 2606–15. ICML’16. New York, NY, USA: JMLR.org.
Clark, Alexander, Christophe Costa Florêncio, and Chris Watkins. 2006. In Machine Learning: ECML 2006, edited by Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, 90–101. Lecture Notes in Computer Science 4212. Springer Berlin Heidelberg.
Clark, Alexander, Christophe Costa Florêncio, Chris Watkins, and Mariette Serayet. 2006. In Grammatical Inference: Algorithms and Applications, edited by Yasubumi Sakakibara, Satoshi Kobayashi, Kengo Sato, Tetsuro Nishino, and Etsuji Tomita, 148–60. Lecture Notes in Computer Science 4201. Springer Berlin Heidelberg.
Clark, Alexander, and Chris Watkins. 2008. Fundamenta Informaticae 84 (3): 291–303.
Collins, Michael, and Nigel Duffy. 2002. In Advances in Neural Information Processing Systems 14, edited by T. G. Dietterich, S. Becker, and Z. Ghahramani, 625–32. MIT Press.
Cortes, Corinna, Patrick Haffner, and Mehryar Mohri. 2004. Journal of Machine Learning Research 5 (December): 1035–62.
Cucker, Felipe, and Steve Smale. 2002. Bulletin of the American Mathematical Society 39 (1): 1–49.
Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press.
Curtain, Ruth F. 1975. SIAM Journal on Control 13 (1): 89–104.
Danafar, Somayeh, Kenji Fukumizu, and Faustino Gomez. 2014. arXiv:1408.5810 [Stat], August.
Devroye, Luc, László Györfi, and Gábor Lugosi. 1996. A Probabilistic Theory of Pattern Recognition. New York: Springer.
Domingos, Pedro. 2020. arXiv:2012.00152 [Cs, Stat], November.
Drineas, Petros, and Michael W. Mahoney. 2005. Journal of Machine Learning Research 6 (December): 2153–75.
Duttweiler, D., and T. Kailath. 1973a. IEEE Transactions on Information Theory 19 (1): 19–28.
———. 1973b. IEEE Transactions on Information Theory 19 (1): 29–37.
Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74.
Evgeniou, Theodoros, Charles A. Micchelli, and Massimiliano Pontil. 2005. Journal of Machine Learning Research 6 (Apr): 615–37.
Feragen, Aasa, and Søren Hauberg. n.d. “Open Problem: Kernel Methods on Manifolds and Metric Spaces,” 4.
FitzGerald, Derry, Antoine Liukus, Zafar Rafii, Bryan Pardo, and Laurent Daudet. 2013. In Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014). 25th IET, 35–40. IET.
Flaxman, Seth, Yee Whye Teh, and Dino Sejdinovic. 2016. arXiv:1610.08623 [Stat], October.
Friedlander, B., T. Kailath, and L. Ljung. 1975. In 1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes, 57–58.
Genton, Marc G. 2001. Journal of Machine Learning Research 2 (December): 299–312.
Gevers, M., and T. Kailath. 1973. IEEE Transactions on Automatic Control 18 (6): 588–600.
Globerson, Amir, and Roi Livni. 2016. arXiv:1606.05316 [Cs], June.
Gorham, Jackson, Anant Raj, and Lester Mackey. 2020. arXiv:2007.02857 [Cs, Math, Stat], October.
Gottwald, Georg A., and Sebastian Reich. 2020. arXiv:2007.07383 [Physics, Stat], July.
Grauman, K., and T. Darrell. 2005. In Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005, 2:1458–1465 Vol. 2.
Greengard, L., and J. Strain. 1991. SIAM Journal on Scientific and Statistical Computing 12 (1): 79–94.
Gretton, Arthur, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, and Alexander J Smola. 2008. In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press.
Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Haussler, David. 1999. Technical report, UC Santa Cruz.
Heinonen, Markus, and Florence d’Alché-Buc. 2014. arXiv:1411.5172 [Cs, Stat], November.
Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. 2008. The Annals of Statistics 36 (3): 1171–1220.
Ishikawa, Isao, Keisuke Fujii, Masahiro Ikeda, Yuka Hashimoto, and Yoshinobu Kawahara. 2018. arXiv:1805.12324 [Cs, Math, Stat], October.
Jain, Brijnesh J. 2009. “Structure Spaces.” Journal of Machine Learning Research 10.
Jung, Alexander. 2013. In Advances in Neural Information Processing Systems 29.
Kailath, T. 1971a. IEEE Transactions on Information Theory 17 (5): 530–49.
———. 1971b. In 1971 IEEE Conference on Decision and Control, 407–11.
———. 1974. IEEE Transactions on Information Theory 20 (2): 146–81.
Kailath, T., and D. Duttweiler. 1972. IEEE Transactions on Information Theory 18 (6): 730–45.
Kailath, T., and R. Geesey. 1971. IEEE Transactions on Automatic Control 16 (6): 720–27.
———. 1973. IEEE Transactions on Automatic Control 18 (5): 435–53.
Kailath, T., R. Geesey, and H. Weinert. 1972. IEEE Transactions on Information Theory 18 (3): 341–48.
Kailath, Thomas. 1971. “The Structure of Radon-Nikodym Derivatives with Respect to Wiener and Related Measures.” The Annals of Mathematical Statistics 42 (3): 1054–67.
Kailath, T., and H. Weinert. 1975. IEEE Transactions on Information Theory 21 (1): 15–23.
Kanagawa, Motonobu, and Kenji Fukumizu. 2014. In Journal of Machine Learning Research.
Kanagawa, Motonobu, Philipp Hennig, Dino Sejdinovic, and Bharath K. Sriperumbudur. 2018. arXiv:1807.02582 [Cs, Stat], July.
Katharopoulos, Angelos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. arXiv:2006.16236 [Cs, Stat], August.
Kemerait, R., and D. Childers. 1972. IEEE Transactions on Information Theory 18 (6): 745–59.
Keriven, Nicolas, Anthony Bourrier, Rémi Gribonval, and Patrick Pérez. 2016. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6190–94.
Khintchine, A. 1934. Mathematische Annalen 109 (1): 604–15.
Kimeldorf, George S., and Grace Wahba. 1970. The Annals of Mathematical Statistics 41 (2): 495–502.
Kloft, Marius, Ulrich Rückert, and Peter L. Bartlett. 2010. In Machine Learning and Knowledge Discovery in Databases, edited by José Luis Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag, 66–81. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Klus, Stefan, Andreas Bittracher, Ingmar Schuster, and Christof Schütte. 2018. The Journal of Chemical Physics 149 (24): 244109.
Kontorovich, Leonid (Aryeh), Corinna Cortes, and Mehryar Mohri. 2008. Theoretical Computer Science, Algorithmic Learning Theory, 405 (3): 223–36.
Kontorovich, Leonid, Corinna Cortes, and Mehryar Mohri. 2006. In Algorithmic Learning Theory, edited by José L. Balcázar, Philip M. Long, and Frank Stephan, 288–303. Lecture Notes in Computer Science 4264. Springer Berlin Heidelberg.
Koppel, Alec, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro. 2016. arXiv:1612.04111 [Cs, Stat], December.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. In UAI17.
Kulis, Brian, and Kristen Grauman. 2012. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (6): 1092–1104.
Lawrence, Neil, Matthias Seeger, and Ralf Herbrich. 2003. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems, 609–16.
Ley, Christophe, Gesine Reinert, and Yvik Swan. 2017. Probability Surveys 14 (none): 1–52.
Liu, Qiang, Jason D. Lee, and Michael I. Jordan. 2016. arXiv:1602.03253 [Stat], July.
Liutkus, Antoine, Zafar Rafii, Bryan Pardo, Derry Fitzgerald, and Laurent Daudet. 2014. In, 6–10. IEEE.
Ljung, L., and T. Kailath. 1976. IEEE Transactions on Information Theory 22 (4): 488–91.
Ljung, L., T. Kailath, and B. Friedlander. 1975. In 1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes, 55–56.
Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Lodhi, Huma, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Journal of Machine Learning Research 2 (March): 419–44.
Lopez-Paz, David, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. 2016. arXiv:1605.08179 [Cs, Stat], May.
Lu, Zhengdong, Todd K. Leen, Yonghong Huang, and Deniz Erdogmus. 2008. In Proceedings of the 25th International Conference on Machine Learning, 624–31. ICML ’08. New York, NY, USA: ACM.
Ma, Siyuan, and Mikhail Belkin. 2017. arXiv:1703.10622 [Cs, Stat], March.
Manton, Jonathan H., and Pierre-Olivier Amblard. 2015. Foundations and Trends® in Signal Processing 8 (1–2): 1–126.
McFee, Brian, and Daniel PW Ellis. 2011. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Meidan, R. 1980. Journal of Mathematical Analysis and Applications 76 (1): 124–33.
Mercer, J. 1909. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 209 (441-458): 415–46.
Micchelli, Charles A., and Massimiliano Pontil. 2005a. Journal of Machine Learning Research 6 (Jul): 1099–1125.
———. 2005b. Neural Computation 17 (1): 177–204.
Minh, Hà Quang. 2022. SIAM/ASA Journal on Uncertainty Quantification, February, 96–124.
Muandet, Krikamol, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, and Bernhard Schölkopf. 2014. arXiv:1405.5505 [Cs, Stat], May.
Muandet, Krikamol, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. 2017. Foundations and Trends® in Machine Learning 10 (1-2): 1–141.
Muller, K., S. Mika, G. Ratsch, K. Tsuda, and Bernhard Scholkopf. 2001. IEEE Transactions on Neural Networks 12 (2): 181–201.
Nishiyama, Yu, and Kenji Fukumizu. 2016. The Journal of Machine Learning Research 17 (1): 6240–67.
Parzen, Emanuel. 1959. TR23. STANFORD UNIV CA APPLIED MATHEMATICS AND STATISTICS LABS.
———. 1963. “Probability Density Functionals and Reproducing Kernel Hilbert Spaces.” In Proceedings of the Symposium on Time Series Analysis, 196:155–69. Wiley, New York.
Parzen, Emanuel. 1962. Journal of the Society for Industrial and Applied Mathematics Series A Control 1 (1): 35–62.
Pillonetto, Gianluigi. 2016. arXiv:1612.09158 [Cs, Stat], December.
Poggio, T., and F. Girosi. 1990. Proceedings of the IEEE 78 (9): 1481–97.
Rahimi, Ali, and Benjamin Recht. 2007. In Advances in Neural Information Processing Systems, 1177–84. Curran Associates, Inc.
———. 2009. In Advances in Neural Information Processing Systems, 1313–20. Curran Associates, Inc.
Ramdas, Aaditya, and Leila Wehbe. 2014. arXiv:1406.1922 [Stat], June.
Raykar, Vikas C., and Ramani Duraiswami. 2005.
Rue, Håvard, and Leonhard Held. 2005. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Boca Raton: Chapman & Hall/CRC.
Saha, Akash, and Palaniappan Balamurugan. 2020. In Advances in Neural Information Processing Systems. Vol. 33.
Särkkä, Simo. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.
Schaback, Robert, and Holger Wendland. 2006. Acta Numerica 15 (May): 543–639.
Schlegel, Kevin. 2018. arXiv:1809.10284 [Cs, Math, Stat], September.
Schölkopf, Bernhard, Ralf Herbrich, and Alex J. Smola. 2001. In Computational Learning Theory, edited by David Helmbold and Bob Williamson, 416–26. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Schölkopf, Bernhard, Phil Knirsch, Alex Smola, and Chris Burges. 1998. In Mustererkennung 1998, edited by Paul Levi, Michael Schanz, Rolf-Jürgen Ahlers, and Franz May, 125–32. Informatik Aktuell. Springer Berlin Heidelberg.
Schölkopf, Bernhard, Sebastian Mika, Chris J. C. Burges, Philipp Knirsch, Klaus-Robert Müller, Gunnar Rätsch, and Alexander J. Smola. 1999. “Input Space Versus Feature Space in Kernel-Based Methods.” IEEE Transactions on Neural Networks 10: 1000–1017.
Schölkopf, Bernhard, Krikamol Muandet, Kenji Fukumizu, and Jonas Peters. 2015. arXiv:1501.06794 [Cs, Stat], January.
Schölkopf, Bernhard, and Alexander J. Smola. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
———. 2003. In Advanced Lectures on Machine Learning, edited by Shahar Mendelson and Alexander J. Smola, 41–64. Lecture Notes in Computer Science 2600. Springer Berlin Heidelberg.
Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. 1997. In Artificial Neural Networks — ICANN’97, edited by Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud, 583–88. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Schuster, Ingmar, Mattes Mollenhauer, Stefan Klus, and Krikamol Muandet. 2019. arXiv:1905.11255 [Cs, Math, Stat], May.
Schuster, Ingmar, Heiko Strathmann, Brooks Paige, and Dino Sejdinovic. 2017. In ECML-PKDD 2017.
Segall, A., M. Davis, and T. Kailath. 1975. IEEE Transactions on Information Theory 21 (2): 143–49.
Segall, A., and T. Kailath. 1976. IEEE Transactions on Information Theory 22 (3): 287–98.
Shen, Yanning, Brian Baingana, and Georgios B. Giannakis. 2016. arXiv:1610.06551 [Stat], October.
Smola, A. J., and B. Schölkopf. 1998. Algorithmica 22 (1-2): 211–31.
Smola, Alex J., and Bernhard Schölkopf. 2000.
———. 2004. Statistics and Computing 14 (3): 199–222.
Smola, Alex J., Bernhard Schölkopf, and Klaus-Robert Müller. 1998. Neural Networks 11 (4): 637–49.
Snelson, Edward, and Zoubin Ghahramani. 2005. In Advances in Neural Information Processing Systems, 1257–64.
Solin, Arno, and Simo Särkkä. 2020. Statistics and Computing 30 (2): 419–46.
Sriperumbudur, B. K., A. Gretton, K. Fukumizu, G. Lanckriet, and B. Schölkopf. 2008. In Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008).
Steinwart, Ingo. 2020. arXiv:2002.03171 [Cs, Math], March.
Székely, Gábor J., and Maria L. Rizzo. 2009. The Annals of Applied Statistics 3 (4): 1236–65.
Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. 2007. The Annals of Statistics 35 (6): 2769–94.
Tipping, Michael E., and Cambridge Cb Nh. 2001. In Advances in Neural Information Processing Systems 13, 633–39. MIT Press.
Tompkins, Anthony, and Fabio Ramos. 2018. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1).
Vedaldi, A., and A. Zisserman. 2012. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (3): 480–92.
Vert, Jean-Philippe, Koji Tsuda, and Bernhard Schölkopf. 2004. In Kernel Methods in Computational Biology. MIT Press.
Vishwanathan, S. V. N., Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt. 2010. Journal of Machine Learning Research 11 (August): 1201–42.
Walder, Christian, Kwang In Kim, and Bernhard Schölkopf. 2008. In Proceedings of the 25th International Conference on Machine Learning, 1112–19. ICML ’08. New York, NY, USA: ACM.
Walder, C., B. Schölkopf, and O. Chapelle. 2006. Computer Graphics Forum 25 (3): 635–44.
Wang, Yu-Xiang, Alex Smola, and Ryan J. Tibshirani. 2014. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, 730–38. ICML’14. Beijing, China: JMLR.org.
Weinert, Howard L. 1978. Communications in Statistics - Simulation and Computation 7 (4): 417–35.
Weinert, Howard L., and Thomas Kailath. 1974. The Annals of Statistics 2 (4): 787–94.
Weinert, H., and G. Sidhu. 1978. IEEE Transactions on Information Theory 24 (1): 45–50.
Williams, Christopher K. I. 2001. In Advances in Neural Information Processing Systems 13, edited by T. K. Leen, T. G. Dietterich, and V. Tresp, 46:675–81. MIT Press.
Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013. In International Conference on Machine Learning.
Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015. arXiv:1510.07389 [Cs, Stat], October.
Wu, Qiang, and Ding-Xuan Zhou. 2008. Computers & Mathematics with Applications 56 (11): 2896–2907.
Xu, Jian-Wu, A.R.C. Paiva, Il Park, and J.C. Principe. 2008. IEEE Transactions on Signal Processing 56 (12): 5891–5902.
Xu, Wenkai, and Takeru Matsuda. 2020. In International Conference on Artificial Intelligence and Statistics, 320–30. PMLR.
———. 2021. arXiv:2103.00895 [Stat], March.
Xu, Wenkai, and Gesine Reinert. 2021. arXiv:2103.00580 [Stat], February.
Yaglom, A. M. 1987. Correlation Theory of Stationary and Related Random Functions. Volume II: Supplementary Notes and References. Springer Series in Statistics. New York, NY: Springer Science & Business Media.
Yang, Changjiang, Ramani Duraiswami, and Larry S. Davis. 2004. In Advances in Neural Information Processing Systems, 1561–68.
Yang, Changjiang, Ramani Duraiswami, Nail A. Gumerov, and Larry Davis. 2003. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, 464–64. ICCV ’03. Washington, DC, USA: IEEE Computer Society.
Yang, Jiyan, Vikas Sindhwani, Haim Avron, and Michael Mahoney. 2014. arXiv:1412.8293 [Cs, Math, Stat], December.
Yang, Tianbao, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, and Zhi-Hua Zhou. 2012. In Advances in Neural Information Processing Systems, 476–84.
Yu, Yaoliang, Hao Cheng, Dale Schuurmans, and Csaba Szepesvári. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 570–78.
Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. arXiv:1202.3775 [Cs, Stat], February.
Zhang, Qinyi, Sarah Filippi, Arthur Gretton, and Dino Sejdinovic. 2016. arXiv:1606.07892 [Stat], June.
Zhou, Ke, Hongyuan Zha, and Le Song. 2013. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1301–9.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.