Kernel warping

A nonlinear way of transforming stationary kernels into non-stationary ones by transforming their inputs (Sampson and Guttorp 1992; Genton 2001; Genton and Perrin 2004; Perrin and Senoussi 1999, 2000).

This is of interest in the context of composing kernels to have known desirable properties by known transforms, and also learning (somwhat) arbitrary transforms to attain stationarity.

Stationary reducible kernels

The main idea is to find a new feature space where stationarity (Sampson and Guttorp 1992) or local stationarity (Perrin and Senoussi 1999, 2000; Genton and Perrin 2004) can be achieved.

Genton (2001) summarises:

We say that a nonstationary kernel \(K(\mathbf{x}, \mathbf{z})\) is stationary reducible if there exist a bijective deformation \(\Phi\) such that: \[ K(\mathbf{x}, \mathbf{z})=K_{S}^{*}(\mathbf{\Phi}(\mathbf{x})-\mathbf{\Phi}(\mathbf{z})) \] where \(K_{S}^{*}\) is a stationary kernel.

Classic deformations

MacKay warping

Learning transforms


Belkin, Mikhail, Siyuan Ma, and Soumik Mandal. 2018. “To Understand Deep Learning We Need to Understand Kernel Learning.” In International Conference on Machine Learning, 541–49.
Bohn, Bastian, Michael Griebel, and Christian Rieger. 2018. “A Representer Theorem for Deep Kernel Learning.” June 7, 2018.
Damian, Doris, Paul D. Sampson, and Peter Guttorp. 2001. “Bayesian Estimation of Semi-Parametric Non-Stationary Spatial Covariance Structures.” Environmetrics 12 (2): 161–78.<161::AID-ENV452>3.0.CO;2-G.
Feragen, Aasa, and Søren Hauberg. n.d. “Open Problem: Kernel Methods on Manifolds and Metric Spaces,” 4.
Genton, Marc G. 2001. “Classes of Kernels for Machine Learning: A Statistics Perspective.” Journal of Machine Learning Research 2 (December): 299–312.
Genton, Marc G., and Olivier Perrin. 2004. “On a Time Deformation Reducing Nonstationary Stochastic Processes to Local Stationarity.” Journal of Applied Probability 41 (1, 1): 236–49.
Hinton, Geoffrey E, and Ruslan R Salakhutdinov. 2008. “Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes.” In Advances in Neural Information Processing Systems 20, edited by J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, 1249–56. Curran Associates, Inc.
Perrin, Olivier, and Rachid Senoussi. 1999. “Reducing Non-Stationary Stochastic Processes to Stationarity by a Time Deformation.” Statistics & Probability Letters 43 (4): 393–97.
———. 2000. “Reducing Non-Stationary Random Fields to Stationarity and Isotropy Using a Space Deformation.” Statistics & Probability Letters 48 (1): 23–32.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: Max-Planck-Gesellschaft; MIT Press.
Sampson, Paul D., and Peter Guttorp. 1992. “Nonparametric Estimation of Nonstationary Spatial Covariance Structure.” Journal of the American Statistical Association 87 (417): 108–19.
Schmidt, Alexandra M., and Anthony O’Hagan. 2003. “Bayesian Inference for Non-Stationary Spatial Covariance Structure via Spatial Deformations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 (3): 743–58.
Shimotsu, Katsumi, and Peter C. B. Phillips. 2004. “Local Whittle Estimation in Nonstationary and Unit Root Cases.” The Annals of Statistics 32 (2): 656–92.
Snoek, Jasper, Kevin Swersky, Rich Zemel, and Ryan Adams. 2014. “Input Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 1674–82.
Tompkins, Anthony, and Fabio Ramos. 2018. “Fourier Feature Approximations for Periodic Kernels in Time-Series Modelling.” Proceedings of the AAAI Conference on Artificial Intelligence 32 (1, 1).
Wilson, Andrew Gordon, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. 2016. “Deep Kernel Learning.” In Artificial Intelligence and Statistics, 370–78. PMLR.