Bayes inference in an open world

Realizability, infrabayesianism, M-open, M-closed, mis-specification

2016-05-30 — 2026-01-14

Wherein misspecified Bayes is treated as M-open, and predictive mixtures are formed by LOO cross-validation with PSIS, while likelihoods are tempered by η to restrain overconfidence.

Bayes
how do science
statistics
Figure 1

It turns out all models are wrong. We’re used to that, but we rarely account for it in Bayesian inference. If my model is “wrong” in the sense that the ground truth isn’t in the hypothesis class, can I treat it “as if” it were correct and still recover something like “nearly good” inference? Does it really matter if I didn’t account for my model being misspecified?

The answer: it depends. M-open is a term used to describe different relationships between our hypothesis class and reality, and to help us reason about how bad it is that our simplifications aren’t perfect. Infrabayesianism is a set of strategies for doing principled reasoning with oversimplified models.

Here are my notes on this.

For now, consider Christian Robert’s brief intro:

A few thoughts (and many links to my blog entries!) about that meme that all models are wrong:

  1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data-generating model (if any);
  2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
  3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
  4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a downgraded version expressed as a power of the original likelihood.
  5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a convoluted manner (and I added some comments on my blog). I presume you could gather material for a discussion from the entries about your question.
  6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

1 M-open Bayes

Fancy folks write M-open as \(\mathcal{M}\)-open, but life’s too short for fancy typography.

Le and Clarke (2017) summarises:

For the sake of completeness, we recall that Bernardo and Smith (Bernardo and Smith 2000) define M-closed problems as those for which a true model can be identified and written down but is one amongst finitely many models from which an analyst has to choose. By contrast, M-complete problems are those in which a true model (sometimes called a belief model) exists but is inaccessible in the sense that even though it can be conceptualised it cannot be written down or at least cannot be used directly. Effectively this means that other surrogate models must be identified and used for inferential purposes. M-open problems according to Bernardo and Smith (2000) are those problems where a true model exists but cannot be specified at all.

They also mention Clyde and Iversen (2013) as a useful resource.

My understanding is as follows: In statistical modelling, we often operate under a convenient fiction: that somewhere within our set of candidate models lies the “true” process that generated our data. This is known as the M-closed setting. There’s also M-complete in the mix, but it doesn’t seem to be a popular category in practice, so I won’t disambiguate it here. But what happens when we acknowledge that our models are, at best, useful approximations of a reality far more complex than they can capture?

This brings us to the M-open setting, where we accept that the true data-generating process is fundamentally outside our model class. This is, of course, the state of the world for most complex, real-world systems, because the map is not the territory, which is surprisingly not always acknowledged.

If no model is “true,” the goal of inference is no longer to identify that true model. Instead, we focus on predictive performance and robust decision-making under (unavoidable) model misspecification. The archetypal question changes from “Which model is right?” to “Which model is most useful, and how can we mitigate the risks of it being wrong?”

Related: likelihood principle, decision-theory, black swans, …

Bayesians have developed a pragmatic set of tools for navigating the M-open world, emphasizing practical performance over theoretical purity. The next few sections explore popular alternatives.

2 Ignore misspecification

The default, and very popular in practice.

3 Stacking

The most common approach in early M-open applications is to use Bayesian model stacking.

If we can’t trust any single model, why not combine them in such a way that the models make up for each other’s deficiencies? Instead of traditional Bayesian model averaging, M-open practice favours stacking. Stacking uses cross-validation to find the optimal weights for combining multiple models into a single predictive distribution that performs best on out-of-sample data.

Specifically, we use Leave-one-out cross-validation (LOO) and its efficient approximation, PSIS (Pareto smoothed importance sampling). These methods provide a robust estimate of a model’s predictive accuracy, helping practitioners choose and combine models in a way that is explicitly geared for performance in the face of misspecification. The focus is on building a better predictive engine (Le and Clarke 2017), not on finding some imaginary ground truth in the model set.

4 Generalized and Gibbs Posteriors

Standard Bayesian updating can behave poorly when the model is misspecified, sometimes becoming over-confidently wrong as more data comes in. Generalized Bayesian posteriors somewhat address this by “tempering” the likelihood with a learning rate parameter (η). This down-weights the influence of the likelihood, preventing the model from becoming too concentrated on a flawed representation of the world.

This approach, also known as a Gibbs posterior, helps repair some of the statistical inconsistencies that arise under misspecification. Some methods, like SafeBayes (Thomas and Corander 2019), even learn the optimal tempering rate from the data itself, offering a more adaptive way to handle the mismatch between model and reality.

My read on this approach is that it’s a pragmatic robustification procedure, but not terribly theoretically satisfying.

5 Alternative Bayes foundations

Infrabayesianism, maximin expected utility, and other approaches rebuild the foundations of reasoning to handle misspecification from the ground up. See Imprecise Bayesianism.

6 Gibbs posteriors

Relatedly, Gibbs posteriors, if we squint, look like an attempt to address the M-open problem by removing the need for a valid likelihood. AFAICT they always imply a valid likelihood, though, so they don’t directly address the problem.

Figure 2

7 References

Alquier. 2024. User-Friendly Introduction to PAC-Bayes Bounds.” Foundations and Trends in Machine Learning.
Baek, Aquino, and Mukherjee. 2023. Generalized Bayes Approach to Inverse Problems with Model Misspecification.” Inverse Problems.
Berger, and Wolpert. 1988. The Likelihood Principle.
Bernardo, and Smith. 2000. Bayesian Theory.
Bissiri, Holmes, and Walker. 2016. A General Framework for Updating Belief Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Bochkina. 2023. Bernstein–von Mises Theorem and Misspecified Models: A Review.” In Foundations of Modern Statistics. Springer Proceedings in Mathematics & Statistics.
Briol, Barp, Duncan, et al. 2019. Statistical Inference for Generative Models with Maximum Mean Discrepancy.”
Catoni. 2007. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning.
Cherief-Abdellatif, and Alquier. 2020. MMD-Bayes: Robust Bayesian Estimation via Maximum Mean Discrepancy.” In Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference.
Chérief-Abdellatif, Alquier, and Khan. 2019. A Generalization Bound for Online Variational Inference.”
Clarke. 2003. Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot Be Ignored.” The Journal of Machine Learning Research.
Clyde, and Iversen. 2013. Bayesian Model Averaging in the M-Open Framework.” In Bayesian Theory and Applications.
Dellaporta, Knoblauch, Damoulas, et al. 2022. Robust Bayesian Inference for Simulator-Based Models via the MMD Posterior Bootstrap.” arXiv:2202.04744 [Cs, Stat].
Farmer, Nakamura, and Steinsson. 2021. Learning About the Long Run.” Working Paper. Working Paper Series.
Gill, and King. 2004. What to Do When Your Hessian Is Not Invertible: Alternatives to Model Respecification in Nonlinear Estimation.” Sociological Methods & Research.
Grendár, and Judge. 2012. Not All Empirical Divergence Minimizing Statistical Methods Are Created Equal? AIP Conference Proceedings.
Grünwald, and van Ommen. 2017. Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It.” Bayesian Analysis.
Haddouche, and Guedj. 2022. Online PAC-Bayes Learning.”
Jansen. 2013. Robust Bayesian Inference Under Model Misspecification.”
Kelter. 2021. Bayesian Model Selection in the M-Open Setting — Approximate Posterior Inference and Subsampling for Efficient Large-Scale Leave-One-Out Cross-Validation via the Difference Estimator.” Journal of Mathematical Psychology.
Kleijn, and van der Vaart. 2006. Misspecification in Infinite-Dimensional Bayesian Statistics.” The Annals of Statistics.
Kleijn, and van der Vaart. 2012. The Bernstein-Von-Mises Theorem Under Misspecification.” Electronic Journal of Statistics.
Knoblauch, Jewson, and Damoulas. 2019. Generalized Variational Inference: Three Arguments for Deriving New Posteriors.”
———. 2022. An Optimization-Centric View on Bayes’ Rule: Reviewing and Generalizing Variational Inference.” Journal of Machine Learning Research.
Le, and Clarke. 2017. A Bayes Interpretation of Stacking for M-Complete and M-Open Settings.” Bayesian Analysis.
Leike. 2016. “Nonparametric General Reinforcement Learning.”
Loecher. 2021. The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits.” Frontiers in Artificial Intelligence.
Lv, and Liu. 2014. Model Selection Principles in Misspecified Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology.
Lyddon, Walker, and Holmes. 2018. Nonparametric Learning from Bayesian Models with Randomized Objective Functions.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Masegosa. 2020. Learning Under Model Misspecification: Applications to Variational and Ensemble Methods.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.
Matsubara, Knoblauch, Briol, et al. 2022. Robust Generalised Bayesian Inference for Intractable Likelihoods.” Journal of the Royal Statistical Society Series B: Statistical Methodology.
McAllester. 1998. Some PAC-Bayesian Theorems.” In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. COLT’ 98.
———. 1999. PAC-Bayesian Model Averaging.” In Proceedings of the Twelfth Annual Conference on Computational Learning Theory.
Medina, Olea, Rush, et al. 2021. On the Robustness to Misspecification of \(\alpha\)-Posteriors and Their Variational Approximations.”
Minka. 2002. Bayesian Model Averaging Is Not Model Combination.”
Müller. 2013. Risk of Bayesian Inference in Misspecified Models, and the Sandwich Covariance Matrix.” Econometrica.
Nott, Drovandi, and Frazier. 2023. Bayesian Inference for Misspecified Generative Models.”
Pacchiardi, and Dutta. 2022. Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators.” arXiv:2104.03889 [Stat].
Pati, Bhattacharya, Pillai, et al. 2014. Posterior Contraction in Sparse Bayesian Factor Models for Massive Covariance Matrices.” The Annals of Statistics.
Rivasplata, Kuzborskij, Szepesvari, et al. 2020. PAC-Bayes Analysis Beyond the Usual Bounds.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.
Rodríguez-Gálvez, Thobaben, and Skoglund. 2024. More PAC-Bayes Bounds: From Bounded Losses, to Losses with General Tail Behaviors, to Anytime Validity.” Journal of Machine Learning Research.
Schmon, Cannon, and Knoblauch. 2021. Generalized Posteriors in Approximate Bayesian Computation.” arXiv:2011.08644 [Stat].
Schwartz. 1965. On Bayes Procedures.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete.
Shalizi. 2009. Dynamics of Bayesian Updating with Dependent Data and Misspecified Models.” Electronic Journal of Statistics.
Shirvaikar, Walker, and Holmes. 2024. A General Framework for Probabilistic Model Uncertainty.”
Sucker, and Ochs. 2023. PAC-Bayesian Learning of Optimization Algorithms.” In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics.
Thiemann, Igel, Wintenberger, et al. 2017. A Strongly Quasiconvex PAC-Bayesian Bound.” In Proceedings of the 28th International Conference on Algorithmic Learning Theory.
Thomas, and Corander. 2019. Diagnosing Model Misspecification and Performing Generalized Bayes’ Updates via Probabilistic Classifiers.”
Vansteelandt, Bekaert, and Claeskens. 2012. On Model Selection and Model Misspecification in Causal Inference.” Statistical Methods in Medical Research.
Walker. 2013. Bayesian Inference with Misspecified Models.” Journal of Statistical Planning and Inference.
Wang, and Blei. 2019. Variational Bayes Under Model Misspecification.” In Advances in Neural Information Processing Systems.
Yao, Vehtari, Simpson, et al. 2018. Using Stacking to Average Bayesian Predictive Distributions.” Bayesian Analysis.