TBD
I’m short of time, so I’ll quote a summary from Tiao, Dutordoir, and Picheny (2023), which is not pithy but has the information.
the joint distribution of the model augmented by inducing variables
is where for prior and conditional where
and . The joint variational distribution is defined as , where for variational parameters and s.t. . Integrating out yields the posterior predictive
where parameters
and are learned by minimizing the Kullback-Leibler (KL) divergence between the approximate and exact posterior, KL . Thus seen, sVGP has time complexity at prediction time and during training. In the reproducing kernel Hilbert space (RKHS) associated with , the predictive has a dual representation in which the mean and covariance share the same basis determined by (Cheng & Boots, 2017; Salimbeni et al., 2018). More specifically, the basis function is effectively the vector-valued function whose -th component is defined as
In the standard definition of inducing points,
, so the basis function is solely determined by and the local influence of pseudo-input . Inter-domain inducing features are a generalisation of standard inducing variables in which each variable for some linear operator . A particularly useful operator is the integral transform, , which was originally employed by Lázaro-Gredilla & Figueiras-Vidal (2009). Refer to the manuscript of van der Wilk et al. (2020) for a more thorough and contemporary treatment. A closely related form is the scalar projection of onto some in the RKHS ,
and conditional
where
and . The joint variational distribution is defined as , where for variational parameters and s.t. . Integrating out yields the posterior predictive
where parameters
and are learned by minimising the Kullback-Leibler (KL) divergence between the approximate and exact posterior, KL . Thus seen, sVGP has time complexity at prediction time and during training. In the reproducing kernel Hilbert space (RKHS) associated with , the predictive has a dual representation in which the mean and covariance share the same basis determined by (Cheng & Boots, 2017; Salimbeni et al., 2018). More specifically, the basis function is effectively the vector-valued function whose -th component is defined as
In the standard definition of inducing points,
, so the basis function is solely determined by and the local influence of pseudo-input . Inter-domain inducing features are a generalisation of standard inducing variables in which each variable for some linear operator . A particularly useful operator is the integral transform, , which was originally employed by Lázaro-Gredilla & Figueiras-Vidal (2009). Refer to the manuscript of van der Wilk et al. (2020) for a more thorough and contemporary treatment. A closely related form is the scalar projection of onto some in the RKHS ,
which leads to by the reproducing property of the RKHS. This, in effect, equips the GP approximation with basis functions that are not solely determined by the kernel, and suitable choices can lead to sparser representations and considerable computational benefits (Hensman et al., 2018; Burt et al., 2020; Dutordoir et al., 2020; Sun et al., 2021).
1 Incoming
Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes | Louis Tiao