Neural process regression
December 3, 2019 — November 28, 2023
Gaussian process regression, but with neural nets approximating the kernel function (?) and some other tricky bits, I think. This has been pitched to me specifically as a meta-learning technique.
The trick is we learn some encoder and decoder networks such that they recover some interesting/useful random function distributions.
Jha et al. (2022):
The uncertainty-aware Neural Process Family (NPF) (Garnelo, Rosenbaum, et al. 2018; Garnelo, Schwarz, et al. 2018) aims to address the aforementioned limitations of the Bayesian paradigm by exploiting the function approximation capabilities of deep neural networks to learn a family of real-world data-generating processes, a.k.a., stochastic Gaussian processes (GPs) (Rasmussen and Williams 2006). Neural processes (NPs) define uncertainties in predictions in terms of a conditional distribution over functions given the context (observations) \(C\) drawn from a distribution of functions. Here, each function \(f\) is parameterized using neural networks and can be thought of capturing an underlying data generating stochastic process.
To model the variability of \(f\) based on the variability of the generated data, NPs concurrently train and test their learned parameters on multiple datasets. This endows them with the capability to meta learn their predictive distributions over functions. The meta-learning setup makes NPs fundamentally distinguished from other non-Bayesian uncertainty-aware learning frameworks like stochastic GPs. NPF members thus combine the best of meta learners, GPs and neural networks. Like GPs, NPs learn a distribution of functions, quickly adapt to new observations, and provide uncertainty measures given test time observations. Like neural networks, NPs learn function approximation from data directly besides being efficient at inference. To learn \(f\), NPs incorporate the encoder-decoder architecture that comprises a functional encoding of each observation point followed by the learning of a decoder function whose parameters are capable of unraveling the unobserved function realizations to approximate the outputs of \(f\)…. Despite their resemblance to NPs, the vanilla encoder-decoder networks traditionally based on CNNs, RNNs, and Transformers operate merely on pointwise inputs and clearly lack the incentive to meta learn representations for dynamically changing functions (imagine \(f\) changing over a continuum such as time) and their families. The NPF members not only improve upon these architectures to model functional input spaces and provide uncertainty-aware estimates but also offer natural benefits to a number of challenging real-world tasks. Our study brings into light the potential of NPF models for several such tasks including but not limited to the handling of missing data, handling off-the-grid data, allowing continual and active learning out-of-the-box, superior interpretation capabilities all the while leveraging a diverse range of task-specific inductive biases.
Cool connection — deep sets arise in Neural processes, since the conditionals are required to be equivariant with exchangeable inputs.
1 Functional process
Not sure if this is different again? (Louizos et al. 2019)
2 General stochastic process regression
A different way of relaxing the assumptions of GP regression is to assume non-Gaussian joints. See stochastic process regression.