Neural Bayes posteriors

Training a network to directly estimate a posterior quantity, meta-learning Bayes

2022-11-24 — 2025-07-10

Wherein transformers are trained as Prior-Data Fitted Networks to approximate Bayesian posteriors in-context, are shown to mimic Gaussian processes and are reported to yield over two‑hundredfold speedups for tabular tasks.

Bayes
convolution
density
how do science
likelihood free
machine learning
neural nets
nonparametric
sparser than thou
statistics
uncertainty
Figure 1

We explicitly train NNs to predict posteriors from input data, i.e. in-context learning, that gives us an explicit (approximation to the) Bayesian result. There’s a close connection between that and the implicit Bayesian inference that transformers seem to do. It’s interesting to pair this with predictive Bayes.

1 Neural point estimators

NeuralEstimators facilitates the user-friendly development of neural point estimators, which are neural networks that transform data into parameter point estimates. They are likelihood-free, substantially faster than classical methods, and can be designed to be approximate Bayes estimators. The package caters for any model for which simulation is feasible.

Permutation-invariant neural estimators (Sainsbury-Dale, Zammit-Mangion, and Huser 2022, 2024) which lean on deep sets.

Note that deep sets are a specialization of the attention architecture, and that leads us to wonder whether transformers can be trained to do Bayesian inference even better. Yes — read on.

2 Train a transformer to estimate a posterior in-context

The PFN (Müller et al. 2021) architecture has many popular variants, and we can elegantly show they perform Bayesian inference (Hollmann et al. 2023; Dooley et al. 2023).

Müller et al. (2021):

We present Prior-Data Fitted Networks (PFNs). PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. When presented with a set of samples from a new supervised learning task, PFNs make probabilistic predictions for arbitrary data points in a single forward pass, because they’ve learned to approximate Bayesian inference. We demonstrate that PFNs can nearly perfectly mimic Gaussian processes and enable efficient Bayesian inference for intractable problems, achieving over 200-fold speedups in several setups compared to current methods.

They’ve had particular success on tabular data.

3 Neural processes

We discuss neural processes.

4 References

Bai, Chen, Wang, et al. 2023. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.” In Advances in Neural Information Processing Systems.
Binz, Dasgupta, Jagadish, et al. 2024. Meta-Learned Models of Cognition.” Behavioral and Brain Sciences.
Dooley, Khurana, Mohapatra, et al. 2023. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting.” In Advances in Neural Information Processing Systems.
Hollmann, Müller, Eggensperger, et al. 2023. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”
Hollmann, Müller, Purucker, et al. 2025. Accurate Predictions on Small Data with a Tabular Foundation Model.” Nature.
Müller, Feurer, Hollmann, et al. 2023. “PFNs4BO: In-Context Learning for Bayesian Optimization.” arXiv Preprint arXiv:2305.17535.
Müller, Hollmann, Arango, et al. 2021. Transformers Can Do Bayesian Inference.” In.
Nguyen, and Grover. 2022. Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling.” In.
Richards, Sainsbury-Dale, Zammit-Mangion, et al. 2024. Neural Bayes Estimators for Censored Inference with Peaks-over-Threshold Models.”
Sainsbury-Dale, Zammit-Mangion, and Huser. 2022. Fast Optimal Estimation with Intractable Models Using Permutation-Invariant Neural Networks.”
———. 2024. Likelihood-Free Parameter Estimation with Neural Bayes Estimators.” The American Statistician.
Zammit-Mangion, Sainsbury-Dale, and Huser. 2024. Neural Methods for Amortized Inference.”