M-open, M-closed, infrabayesianism

2016-05-30 — 2025-09-07

Wherein the consequences of model misspecification are examined, M-open practices such as stacking and tempered Gibbs posteriors are considered, and infrabayesianism is presented via convex infradistributions.

Bayes

how do science

statistics

It turns out that all models are wrong. We’re used to that, but rarely account for it in Bayesian inference. If my model is “wrong” in the sense that the ground truth is not in the hypothesis class, can I treat it “as if” it were correct, and still recover something like “nearly good” inference anyway? Does it really matter if I didn’t take into account that my model was misspecified?

The answer is: it depends. M-open is terminology used to describe different relations between our hypothesis class and reality, and to help us reason about how bad it is that our simplifications are not perfect. Infrabayesianism is a set of strategies to do principled reasoning with over-simplified models.

Here’s where I take some notes on this.

1 M-open Bayes

Fancy folks write M-open as \(\mathcal{M}\)-open, but life’s too short for indulgent typography.

Le and Clarke (2017) summarises:

For the sake of completeness, we recall that Bernardo and Smith (Bernardo and Smith 2000) define M-closed problems as those for which a true model can be identified and written down but is one amongst finitely many models from which an analyst has to choose. By contrast, M-complete problems are those in which a true model (sometimes called a belief model) exists but is inaccessible in the sense that even though it can be conceptualised it cannot be written down or at least cannot be used directly. Effectively this means that other surrogate models must be identified and used for inferential purposes. M-open problems according to Bernardo and Smith (2000) are those problems where a true model exists but cannot be specified at all.

They also mention Clyde and Iversen (2013) as a useful resource.

My understanding is as follows: In statistical modelling, we often operate under a convenient fiction: that somewhere within our set of candidate models lies the “true” process that generated our data. This is known as the M-closed setting. There is also M-complete somewhere in there, but this does not seem to be a popular category in practice, so I won’t disambiguate it here. But what happens when we acknowledge that our models are, at best, useful approximations of a reality far more complex than they can capture?

This brings us to the M-open setting, where we accept that the true data-generating process is fundamentally outside of our model class. This is, of course, the state of the world for most complex, real-world systems, because the map is not the territory. This post explores how to reason and act when our models are guaranteed to be wrong, starting with the pragmatic tools of M-open Bayesianism and building to the re-foundational framework of Infrabayesianism.

If no model is “true,” the goal of inference is no longer to identify that true model. Instead, we focus on predictive performance and robust decision-making under unavoidable model misspecification. The archetypal question changes from “Which model is right?” to “Which model is most useful, and how can we mitigate the risks of it being wrong?”

Related: likelihood principle, decision-theory, black swans, …

Bayesians have developed a pragmatic set of tools for navigating the M-open world, emphasizing practical performance over theoretical purity. The next few sections explore popular alternatives.

2 Ignore mis-specification

The default, and very popular in practice.

3 Stacking and LOO Cross-Validation

The most common approach in early M-open applications is to Bayesian model stacking.

If we can’t trust any single model, why not combine them in such a way that the models make up for each other’s deficiencies? Instead of traditional Bayesian model averaging, M-open practice favours stacking. Stacking uses cross-validation to find the optimal weights for combining multiple models into a single predictive distribution that performs best on out-of-sample data.

Specifically, we use Leave-one-out cross-validation (LOO) and its efficient approximation, PSIS (Pareto smoothed importance sampling). These methods provide a robust estimate of a model’s predictive accuracy, helping practitioners choose and combine models in a way that is explicitly geared for performance in the face of misspecification. The focus is on building a better predictive engine (Le and Clarke 2017), not finding an imaginary ground truth.

4 Generalized and Gibbs Posteriors

Standard Bayesian updating can behave poorly when the model is misspecified, sometimes becoming over-confidently wrong as more data comes in. Generalized Bayesian posteriors somewhat address this by “tempering” the likelihood with a learning rate parameter (η). This down-weights the influence of the likelihood, preventing the model from becoming too concentrated on a flawed representation of the world.

This approach, also known as a Gibbs posterior, helps repair some of the statistical inconsistencies that arise under misspecification. Some methods, like SafeBayes (Thomas and Corander 2019), even learn the optimal tempering rate from the data itself, offering a more adaptive way to handle the mismatch between model and reality.

5 Infrabayesianism

Infrabayesianism, in contrast, rebuilds the foundations of reasoning to handle misspecification from the ground up.

It starts with the same core assumption as M-open Bayes: the true state of the world is likely outside an agent’s hypothesis space. This is argued to be especially critical for embedded agents—agents that are part of an environment vastly more complex than they are. An AI cannot model every atom in its server room, so its world model is necessarily incomplete.

I confess I don’t follow that particular emphasis of the reasoning myself — I also cannot model every atom in anything I study, and I can get by without infrabayesian reasoning. I should re-listen to Vanessa Kosoy on this theme. Infrabayesianism is nonetheless motivated as a framework for the future of AI systems that must navigate a world of deep and unavoidable uncertainty.

Where Bayesianism uses a single probability distribution to represent belief, Infrabayesianism uses infradistributions—which are convex sets of probability distributions.

Instead of saying, “The probability of rain is 40%,” an infrabayesian agent might say, “The probability of rain is somewhere between 30% and 60%.” This set-valued belief state, drawn from the theory of imprecise probability, directly captures the agent’s uncertainty and acknowledges the limitations of its model. This allows for more robust reasoning by considering a range of plausible worlds rather than committing to a single, likely-wrong one.

The framework apparently provides update rules and decision-making procedures (like minimax or upper-expectation reasoning) that are philosophically robust, ensuring the agent doesn’t discard useful information and can handle deep uncertainty. I haven’t used any of these in practice yet, so I’ll refrain from offering too much opinion.

6 References

Berger, and Wolpert. 1988. The Likelihood Principle.

Bernardo, and Smith. 2000. Bayesian Theory.

Bissiri, Holmes, and Walker. 2016. “A General Framework for Updating Belief Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Briol, Barp, Duncan, et al. 2019. “Statistical Inference for Generative Models with Maximum Mean Discrepancy.”

Cherief-Abdellatif, and Alquier. 2020. “MMD-Bayes: Robust Bayesian Estimation via Maximum Mean Discrepancy.” In Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference.

Chérief-Abdellatif, Alquier, and Khan. 2019. “A Generalization Bound for Online Variational Inference.”

Clarke. 2003. “Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot Be Ignored.” The Journal of Machine Learning Research.

Clyde, and Iversen. 2013. “Bayesian Model Averaging in the M-Open Framework.” In Bayesian Theory and Applications.

Dellaporta, Knoblauch, Damoulas, et al. 2022. “Robust Bayesian Inference for Simulator-Based Models via the MMD Posterior Bootstrap.” arXiv:2202.04744 [Cs, Stat].

Jansen. 2013. “Robust Bayesian Inference Under Model Misspeciﬁcation.”

Kelter. 2021. “Bayesian Model Selection in the M-Open Setting — Approximate Posterior Inference and Subsampling for Efficient Large-Scale Leave-One-Out Cross-Validation via the Difference Estimator.” Journal of Mathematical Psychology.

Knoblauch, Jewson, and Damoulas. 2019. “Generalized Variational Inference: Three Arguments for Deriving New Posteriors.”

———. 2022. “An Optimization-Centric View on Bayes’ Rule: Reviewing and Generalizing Variational Inference.” Journal of Machine Learning Research.

Le, and Clarke. 2017. “A Bayes Interpretation of Stacking for M-Complete and M-Open Settings.” Bayesian Analysis.

Lyddon, Walker, and Holmes. 2018. “Nonparametric Learning from Bayesian Models with Randomized Objective Functions.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.

Masegosa. 2020. “Learning Under Model Misspecification: Applications to Variational and Ensemble Methods.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.

Matsubara, Knoblauch, Briol, et al. 2022. “Robust Generalised Bayesian Inference for Intractable Likelihoods.” Journal of the Royal Statistical Society Series B: Statistical Methodology.

Minka. 2002. “Bayesian Model Averaging Is Not Model Combination.”

Pacchiardi, and Dutta. 2022. “Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators.” arXiv:2104.03889 [Stat].

Schmon, Cannon, and Knoblauch. 2021. “Generalized Posteriors in Approximate Bayesian Computation.” arXiv:2011.08644 [Stat].

Shirvaikar, Walker, and Holmes. 2024. “A General Framework for Probabilistic Model Uncertainty.”

Thomas, and Corander. 2019. “Diagnosing Model Misspecification and Performing Generalized Bayes’ Updates via Probabilistic Classifiers.”

Yao, Vehtari, Simpson, et al. 2018. “Using Stacking to Average Bayesian Predictive Distributions.” Bayesian Analysis.