Neural Bayes posteriors

Training a network to directly estimate a posterior quantity, meta-learning Bayes

2022-11-24 — 2025-07-10

Wherein transformers are trained as Prior-Data Fitted Networks to approximate Bayesian posteriors in-context, are shown to mimic Gaussian processes and are reported to yield over two‑hundredfold speedups for tabular tasks.

Bayes

convolution

density

how do science

likelihood free

machine learning

neural nets

nonparametric

sparser than thou

statistics

uncertainty

We explicitly train NNs to predict posteriors from input data, i.e. in-context learning, that gives us an explicit (approximation to the) Bayesian result. There’s a close connection between that and the implicit Bayesian inference that transformers seem to do as in-context learners. It’s interesting to pair this with predictive Bayes.

1 Neural point estimators

NeuralEstimators facilitates the user-friendly development of neural point estimators, which are neural networks that transform data into parameter point estimates. They are likelihood-free, substantially faster than classical methods, and can be designed to be approximate Bayes estimators. The package caters for any model for which simulation is feasible.

Permutation-invariant neural estimators (Sainsbury-Dale, Zammit-Mangion, and Huser 2022, 2024) which lean on deep sets.

Note that deep sets are a specialization of the attention architecture, and that leads us to wonder whether transformers can be trained to do Bayesian inference even better. Yes — read on.

2 Train a transformer to estimate a posterior predictive in-context

The PFN (Müller et al. 2021) architecture has many popular variants, and we can elegantly show they perform Bayesian inference (Hollmann et al. 2023; Dooley et al. 2023).

Müller et al. (2021):

We present Prior-Data Fitted Networks (PFNs). PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. When presented with a set of samples from a new supervised learning task, PFNs make probabilistic predictions for arbitrary data points in a single forward pass, because they’ve learned to approximate Bayesian inference. We demonstrate that PFNs can nearly perfectly mimic Gaussian processes and enable efficient Bayesian inference for intractable problems, achieving over 200-fold speedups in several setups compared to current methods.

They’ve had particular success on tabular data.

automl/PFNs: Our maintained PFN repository. Come here to train SOTA PFNs.
PriorLabs/TabPFN: ⚡ TabPFN: Foundation Model for Tabular Data ⚡
(Ng et al. 2025) which explicitly attempts to be a processor of predictive distributions.

3 Neural processes

We discuss neural processes elsewhere.

4 References

Aswadi, Wei, and Jeffrey. 2025. “Rethinking Description Length: A TabPFN-Based Approximation of Bayesian Mixture Codes.” In.

Bai, Chen, Wang, et al. 2023. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.” In Advances in Neural Information Processing Systems.

Binz, Dasgupta, Jagadish, et al. 2024. “Meta-Learned Models of Cognition.” Behavioral and Brain Sciences.

Dooley, Khurana, Mohapatra, et al. 2023. “ForecastPFN: Synthetically-Trained Zero-Shot Forecasting.” In Advances in Neural Information Processing Systems.

Hollmann, Müller, Eggensperger, et al. 2023. “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”

Hollmann, Müller, Purucker, et al. 2025. “Accurate Predictions on Small Data with a Tabular Foundation Model.” Nature.

Müller, Feurer, Hollmann, et al. 2023. “PFNs4BO: In-Context Learning for Bayesian Optimization.” arXiv Preprint arXiv:2305.17535.

Müller, Hollmann, Arango, et al. 2021. “Transformers Can Do Bayesian Inference.” In.

Ng, Fong, Frazier, et al. 2025. “TabMGP: Martingale Posterior with TabPFN.”

Nguyen, and Grover. 2022. “Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling.” In.

Richards, Sainsbury-Dale, Zammit-Mangion, et al. 2024. “Neural Bayes Estimators for Censored Inference with Peaks-over-Threshold Models.”

Robertson, Reuter, Guo, et al. 2025. “Do-PFN: In-Context Learning for Causal Effect Estimation.”

Sainsbury-Dale, Zammit-Mangion, and Huser. 2022. “Fast Optimal Estimation with Intractable Models Using Permutation-Invariant Neural Networks.”

———. 2024. “Likelihood-Free Parameter Estimation with Neural Bayes Estimators.” The American Statistician.

Zammit-Mangion, Sainsbury-Dale, and Huser. 2024. “Neural Methods for Amortized Inference.”