Learning covariance functions

Learning a family of covariances at once

This is usually in the context of Gaussian processes where everything can work out nicely if you are lucky, but other kernel machines are OK too. The goal for most of these is to maximise the marginal posterior likelihood, a.k.a. model evidence, as is conventional in Bayesian ML.

Learning kernel hyperparameters


Learning kernel composition

Automating kernel design by some composition of simpler atomic kernels. AFAICT this started from summaries like (Genton 2001) and went via Duvenaud’s aforementioned notes to became a small industry (Lloyd et al. 2014; D. K. Duvenaud, Nickisch, and Rasmussen 2011; D. Duvenaud et al. 2013; Grosse et al. 2012). A prominent example was the Automated statistician project by David Duvenaud, James Robert Lloyd, Roger Grosse and colleagues, which works by greedy combinatorial search over possible compositions.

More fashionable, presumably, are the differentiable search methods. For example, the AutoGP system (Krauth et al. 2016; Bonilla, Krauth, and Dezfouli 2019) incorporates tricks like these to use gradient descent to design kernels for Gaussian processes. (Sun et al. 2018) construct deep networks of composed kernels. I imagine the Deep Gaussian Process literature is also of this kind, but have not read it.


Kernels on kernels, for kernel learning kernels 🏗 (Ong, Smola, and Williamson 2005, 2002; Ong and Smola 2003; Kondor and Jebara 2006).


Álvarez, Mauricio A., Lorenzo Rosasco, and Neil D. Lawrence. 2012. “Kernels for Vector-Valued Functions: A Review.” Foundations and Trends® in Machine Learning 4 (3, 3): 195–266. https://doi.org/10.1561/2200000036.
Bach, Francis. 2008. “Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning.” In Proceedings of the 21st International Conference on Neural Information Processing Systems, 105–12. NIPS’08. USA: Curran Associates Inc. http://papers.nips.cc/paper/3418-exploring-large-feature-spaces-with-hierarchical-multiple-kernel-learning.pdf.
Balog, Matej, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, and Yee Whye Teh. 2016. “The Mondrian Kernel.” June 16, 2016. http://arxiv.org/abs/1606.05241.
Bohn, Bastian, Michael Griebel, and Christian Rieger. 2018. “A Representer Theorem for Deep Kernel Learning.” June 7, 2018. http://arxiv.org/abs/1709.10441.
Bonilla, Edwin V., Karl Krauth, and Amir Dezfouli. 2019. “Generic Inference in Latent Gaussian Process Models.” Journal of Machine Learning Research 20 (117): 1–63. http://arxiv.org/abs/1609.00577.
Christoudias, Mario, Raquel Urtasun, and Trevor Darrell. 2009. “Bayesian Localized Multiple Kernel Learning.” UCB/EECS-2009-96. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-96.html.
Cortes, Corinna, Patrick Haffner, and Mehryar Mohri. 2004. “Rational Kernels: Theory and Algorithms.” Journal of Machine Learning Research 5 (December): 1035–62. http://dl.acm.org/citation.cfm?id=1005332.1016793.
Duvenaud, David K., Hannes Nickisch, and Carl E. Rasmussen. 2011. “Additive Gaussian Processes.” In Advances in Neural Information Processing Systems, 226–34. http://papers.nips.cc/paper/4221-additive-gaussian-processes.pdf.
Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. “Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74. http://machinelearning.wustl.edu/mlpapers/papers/icml2013_duvenaud13.
Genton, Marc G. 2001. “Classes of Kernels for Machine Learning: A Statistics Perspective.” Journal of Machine Learning Research 2 (December): 299–312. http://jmlr.org/papers/volume2/genton01a/genton01a.pdf.
Girolami, Mark, and Simon Rogers. 2005. “Hierarchic Bayesian Models for Kernel Learning.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 241–48. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102382.
Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. “Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence. http://arxiv.org/abs/1210.4856.
Hartikainen, J., and S. Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE. https://doi.org/10.1109/MLSP.2010.5589113.
Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. 2008. “Kernel Methods in Machine Learning.” The Annals of Statistics 36 (3, 3): 1171–1220. https://doi.org/10.1214/009053607000000677.
Kom Samo, Yves-Laurent, and Stephen Roberts. 2015. “Generalized Spectral Kernels.” June 7, 2015. http://arxiv.org/abs/1506.02236.
Kondor, Risi, and Tony Jebara. 2006. “Gaussian and Wishart Hyperkernels.” In Proceedings of the 19th International Conference on Neural Information Processing Systems, 729–36. NIPS’06. Cambridge, MA, USA: MIT Press. http://dl.acm.org/citation.cfm?id=2976456.2976548.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In Uai17. http://arxiv.org/abs/1610.05392.
Lawrence, Neil. 2005. “Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.” Journal of Machine Learning Research 6 (Nov, Nov): 1783–1816. http://www.jmlr.org/papers/v6/lawrence05a.html.
Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014. “Automatic Construction and Natural-Language Description of Nonparametric Regression Models.” In Twenty-Eighth AAAI Conference on Artificial Intelligence. http://arxiv.org/abs/1402.4304.
Micchelli, Charles A., and Massimiliano Pontil. 2005a. “Learning the Kernel Function via Regularization.” Journal of Machine Learning Research 6 (Jul, Jul): 1099–1125. http://www.jmlr.org/papers/v6/micchelli05a.html.
———. 2005b. “On Learning Vector-Valued Functions.” Neural Computation 17 (1, 1): 177–204. https://doi.org/10.1162/0899766052530802.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. 1 edition. Adaptive Computation and Machine Learning Series. Cambridge, MA: MIT Press.
Murray-Smith, Roderick, and Barak A. Pearlmutter. 2005. “Transformations of Gaussian Process Priors.” In Deterministic and Statistical Methods in Machine Learning, edited by Joab Winkler, Mahesan Niranjan, and Neil Lawrence, 110–23. Lecture Notes in Computer Science. Springer Berlin Heidelberg. http://bcl.hamilton.ie/ barak/papers/MLW-Jul-2005.pdf.
O’Callaghan, Simon Timothy, and Fabio T. Ramos. 2011. “Continuous Occupancy Mapping with Integral Kernels.” In Twenty-Fifth AAAI Conference on Artificial Intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3784.
Ong, Cheng Soon, Xavier Mary, Stéphane Canu, and Alexander J. Smola. 2004. “Learning with Non-Positive Kernels.” In Twenty-First International Conference on Machine Learning - ICML ’04, 81. Banff, Alberta, Canada: ACM Press. https://doi.org/10.1145/1015330.1015443.
Ong, Cheng Soon, and Alexander J. Smola. 2003. “Machine Learning Using Hyperkernels.” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 568–75. ICML’03. Washington, DC, USA: AAAI Press. http://dl.acm.org/citation.cfm?id=3041838.3041910.
Ong, Cheng Soon, Alexander J. Smola, and Robert C. Williamson. 2002. “Hyperkernels.” In Proceedings of the 15th International Conference on Neural Information Processing Systems, 495–502. NIPS’02. Cambridge, MA, USA: MIT Press. http://dl.acm.org/citation.cfm?id=2968618.2968680.
———. 2005. “Learning the Kernel with Hyperkernels.” Journal of Machine Learning Research 6 (Jul, Jul): 1043–71. http://www.jmlr.org/papers/v6/ong05a.html.
Rakotomamonjy, Alain, Francis R. Bach, Stéphane Canu, and Yves Grandvalet. 2008. SimpleMKL.” Journal of Machine Learning Research 9: 2491–521. http://www.jmlr.org/papers/v9/rakotomamonjy08a.html.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: Max-Planck-Gesellschaft; MIT Press. http://www.gaussianprocess.org/gpml/.
Remes, Sami, Markus Heinonen, and Samuel Kaski. 2018. “Neural Non-Stationary Spectral Kernel.” November 27, 2018. http://arxiv.org/abs/1811.10978.
Saha, Akash, and Palaniappan Balamurugan. 2020. “Learning with Operator-Valued Kernels in Reproducing Kernel Krein Spaces.” In Advances in Neural Information Processing Systems. Vol. 33. https://proceedings.neurips.cc//paper_files/paper/2020/hash/9f319422ca17b1082ea49820353f14ab-Abstract.html.
Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine 30 (4, 4): 51–61. https://doi.org/10.1109/MSP.2013.2246292.
Schölkopf, Bernhard, Ralf Herbrich, and Alex J. Smola. 2001. “A Generalized Representer Theorem.” In Computational Learning Theory, edited by David Helmbold and Bob Williamson, 416–26. Lecture Notes in Computer Science. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44581-1.
Schölkopf, Bernhard, and Alexander J. Smola. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
———. 2003. “A Short Introduction to Learning with Kernels.” In Advanced Lectures on Machine Learning, edited by Shahar Mendelson and Alexander J. Smola, 41–64. Lecture Notes in Computer Science 2600. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-36434-X_2.
Sinha, Aman, and John C Duchi. 2016. “Learning Kernels with Random Features.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1298–1306. Curran Associates, Inc. http://papers.nips.cc/paper/6180-learning-kernels-with-random-features.pdf.
Sun, Shengyang, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, and Roger Grosse. 2018. “Differentiable Compositional Kernel Learning for Gaussian Processes.” 2018. http://arxiv.org/abs/1806.04326.
Uziel, Guy. 2020. “Nonparametric Sequential Prediction While Deep Learning the Kernel.” In International Conference on Artificial Intelligence and Statistics, 111–21. PMLR. http://proceedings.mlr.press/v108/uziel20b.html.
Vert, Jean-Philippe, Koji Tsuda, and Bernhard Schölkopf. 2004. “A Primer on Kernel Methods.” In Kernel Methods in Computational Biology. MIT Press. http://kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/pdfs/pdf2549.pdf.
Vishwanathan, S. V. N., Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt. 2010. “Graph Kernels.” Journal of Machine Learning Research 11 (August): 1201–42. http://authors.library.caltech.edu/20528/1/Vishwanathan2010p11646J_Mach_Learn_Res.pdf.
Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013. “Gaussian Process Kernels for Pattern Discovery and Extrapolation.” In International Conference on Machine Learning. http://arxiv.org/abs/1302.4245.
Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015. “The Human Kernel.” October 26, 2015. http://arxiv.org/abs/1510.07389.
Wilson, Andrew Gordon, and Zoubin Ghahramani. 2012. “Modelling Input Varying Correlations Between Multiple Responses.” In Machine Learning and Knowledge Discovery in Databases, edited by Peter A. Flach, Tijl De Bie, and Nello Cristianini, 858–61. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Wilson, Andrew Gordon, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. 2016. “Deep Kernel Learning.” In Artificial Intelligence and Statistics, 370–78. PMLR. http://proceedings.mlr.press/v51/wilson16.html.
Yu, Yaoliang, Hao Cheng, Dale Schuurmans, and Csaba Szepesvári. 2013. “Characterizing the Representer Theorem.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 570–78. http://www.jmlr.org/proceedings/papers/v28/yu13.pdf.
Zhang, Aonan, and John Paisley. 2019. “Random Function Priors for Correlation Modeling.” In International Conference on Machine Learning, 7424–33. http://proceedings.mlr.press/v97/zhang19k.html.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.