GP inducing features

October 16, 2020 — January 21, 2025

graphical models
Hilbert space
kernel tricks
machine learning


Figure 1

I’m short of time, so I’ll quote a summary from Tiao, Dutordoir, and Picheny (), which is not pithy but has the information.

the joint distribution of the model augmented by inducing variables u is p(y,f,u)=p(yf)p(f,u) where p(f,u)=p(fu)p(u) for prior p(u)=N(0,Kuu) and conditional p(fu)=N(fQfuu,KffQff)

where QffQfuKuuQuf and QfuKfuKuu1. The joint variational distribution is defined as q(f,u)p(fu)q(u), where q(u)N(mu,Cu) for variational parameters muRM and CuRM×M s.t. Cu0. Integrating out u yields the posterior predictive


where parameters mu and Cu are learned by minimizing the Kullback-Leibler (KL) divergence between the approximate and exact posterior, KL [q(f)p(fy)]. Thus seen, sVGP has time complexity O(M3) at prediction time and O(M3+ M2N) during training. In the reproducing kernel Hilbert space (RKHS) associated with k, the predictive has a dual representation in which the mean and covariance share the same basis determined by u (Cheng & Boots, 2017; Salimbeni et al., 2018). More specifically, the basis function is effectively the vector-valued function ku:XRM whose m-th component is defined as


In the standard definition of inducing points, [ku(x)]m= k(zm,x), so the basis function is solely determined by k and the local influence of pseudo-input zm. Inter-domain inducing features are a generalisation of standard inducing variables in which each variable umLm[f] for some linear operator Lm:RXR. A particularly useful operator is the integral transform, Lm[f] Xf(x)ϕm(x)dx, which was originally employed by Lázaro-Gredilla & Figueiras-Vidal (2009). Refer to the manuscript of van der Wilk et al. (2020) for a more thorough and contemporary treatment. A closely related form is the scalar projection of f onto some ϕm in the RKHS H,

Lm[f]f,ϕmH and conditional


where QffQfuKuuQuf and QfuKfuKuu1. The joint variational distribution is defined as q(f,u)p(fu)q(u), where q(u)N(mu,Cu) for variational parameters muRM and CuRM×M s.t. Cu0. Integrating out u yields the posterior predictive


where parameters mu and Cu are learned by minimising the Kullback-Leibler (KL) divergence between the approximate and exact posterior, KL [q(f)p(fy)]. Thus seen, sVGP has time complexity O(M3) at prediction time and O(M3+ M2N) during training. In the reproducing kernel Hilbert space (RKHS) associated with k, the predictive has a dual representation in which the mean and covariance share the same basis determined by u (Cheng & Boots, 2017; Salimbeni et al., 2018). More specifically, the basis function is effectively the vector-valued function ku:XRM whose m-th component is defined as


In the standard definition of inducing points, [ku(x)]m= k(zm,x), so the basis function is solely determined by k and the local influence of pseudo-input zm. Inter-domain inducing features are a generalisation of standard inducing variables in which each variable umLm[f] for some linear operator Lm:RXR. A particularly useful operator is the integral transform, Lm[f] Xf(x)ϕm(x)dx, which was originally employed by Lázaro-Gredilla & Figueiras-Vidal (2009). Refer to the manuscript of van der Wilk et al. (2020) for a more thorough and contemporary treatment. A closely related form is the scalar projection of f onto some ϕm in the RKHS H,

Lm[f]f,ϕmH which leads to [ku(x)]m=ϕm(x) by the reproducing property of the RKHS. This, in effect, equips the GP approximation with basis functions ϕm that are not solely determined by the kernel, and suitable choices can lead to sparser representations and considerable computational benefits (Hensman et al., 2018; Burt et al., 2020; Dutordoir et al., 2020; Sun et al., 2021).

Figure 2: Pruning the basis features.

1 Incoming

Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes | Louis Tiao

2 References

Dutordoir, Durrande, and Hensman. 2020. Sparse Gaussian Processes with Spherical Harmonic Features.” In Proceedings of the 37th International Conference on Machine Learning. ICML’20.
Dutordoir, Hensman, van der Wilk, et al. 2021. Deep Neural Networks as Point Estimates for Deep Gaussian Processes.” In arXiv:2105.04504 [Cs, Stat].
Lázaro-Gredilla, and Figueiras-Vidal. 2009. Inter-Domain Gaussian Processes for Sparse Inference Using Inducing Features.” In Advances in Neural Information Processing Systems.
Rossi, Heinonen, Bonilla, et al. 2021. Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations.” In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.
Shi, Titsias, and Mnih. 2020. Sparse Orthogonal Variational Inference for Gaussian Processes.” In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics.
Tiao, Dutordoir, and Picheny. 2023. Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes.” In.
Wilk, Dutordoir, John, et al. 2020. A Framework for Interdomain and Multioutput Gaussian Processes.”