Imprecise Bayesianism

2025-11-30 — 2026-01-14

Wherein imprecise Bayesianism is presented as an alternative for M‑open problems, where beliefs are represented by convex sets of distributions and PAC‑Bayes generalization bounds are invoked.

Bayes

how do science

statistics

In M-open Bayesian inference, we accept that our models are simplifications that don’t contain the true data-generating process, which leads to problems with standard Bayesian updating.

What alternative foundations or extensions of Bayesianism can better handle model misspecification?

1 Maximin updates over set priors

As far as I know, this is the classic approach. Informally: we have a set of prior distributions representing our beliefs. When we see data, we update each prior using Bayes’ rule to get a set of posteriors. When making decisions, we consider the worst-case expected utility across all posteriors in this set and choose the action that maximizes this worst-case utility.

Easy to say, but I haven’t really used this myself and suspect it’s annoying in practice (Camerer and Weber 1992; De Bock 2020; Giustinelli, Manski, and Molinari 2021; Hayashi 2021; Walley 1991).

2 PAC-Bayes methods

There’s a large body of work on this; I’m not an expert (Catoni 2007; Haddouche and Guedj 2022; Rivasplata et al. 2020; Rodríguez-Gálvez, Thobaben, and Skoglund 2024; Sucker and Ochs 2023; Thiemann et al. 2017).

PAC-Bayes (Probably Approximately Correct Bayesian) methods offer a theoretically grounded way to aggregate misspecified models in the M-open setting, giving high-probability generalization bounds without assuming realizability. They originate from (McAllester 1998, 1999) and were sharpened by Catoni (2007). The bounds control the expected risk of a posterior over hypotheses via the KL divergence to a prior, so we can, in principle, do robust model selection or ensembling even when the true data-generating process lies outside the model class \(\mathcal{M}\).

I’m unclear whether all stacking bounds are in fact of PAC-Bayes type.

PAC-Bayes seems to justify techniques like stacking by quantifying how well a data-dependent posterior generalizes: for concreteness, Catoni’s bound states that for bounded losses \(\ell\), \[\mathbb{E}_{Q}[\mathrm{risk}(h)] \leq \frac{1}{n} \sum \ell(h(x_i), y_i) + \sqrt{\frac{\mathrm{KL}(Q\|P) + \log(2n/\delta)}{2n}},\] with probability \(1-\delta\) over data of size \(n\), prior \(P\), and posterior \(Q\). This sidesteps M-closed assumptions in some sense, but don’t ask me for details.

Tools like paccube or pacbayes.py implement optimization of these bounds for neural nets and beyond.

3 Infrabayesianism

Infrabayesianism — I presume it’s different from standard set-prior approaches, but I don’t really know it well enough to say how.

It starts with the same assumption as M-open Bayes: the true state of the world is likely outside an agent’s hypothesis space. They argue this to be especially critical for embedded agents—agents that are part of an environment vastly more complex than they are. An AI can’t model every atom in its server room, let alone the universe, so its world model is necessarily incomplete.

I confess I don’t follow that emphasis myself — I also can’t model every atom in anything I study, and I get by without infrabayesian reasoning. I should re-listen to Vanessa Kosoy on this theme. Infrabayesianism is nonetheless motivated as a framework for the future of AI systems that must navigate a world of deep and unavoidable uncertainty.

Where Bayesianism uses a single probability distribution to represent belief, Infrabayesianism uses infradistributions—which are convex sets of probability distributions.

Instead of saying, “The probability of rain is 40%,” an infrabayesian agent might say, “The probability of rain is somewhere between 30% and 60%.” I don’t know how this differs from the above definitions of imprecise probability, which directly capture the agent’s uncertainty and acknowledge the limitations of its model. This allows for more robust reasoning by considering a range of plausible worlds rather than committing to a single, likely-wrong one.

The framework apparently provides update rules and decision-making procedures (like minimax or upper-expectation reasoning) that are philosophically robust, ensuring the agent doesn’t discard useful information and can handle deep uncertainty. I haven’t used any of these in practice yet, so I’ll refrain from offering too much opinion.

4 References

Alquier. 2024. “User-Friendly Introduction to PAC-Bayes Bounds.” Foundations and Trends in Machine Learning.

Bissiri, Holmes, and Walker. 2016. “A General Framework for Updating Belief Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Camerer, and Weber. 1992. “Recent Developments in Modeling Preferences: Uncertainty and Ambiguity.” Journal of Risk and Uncertainty.

Catoni. 2007. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning.

Clyde, and Iversen. 2013. “Bayesian Model Averaging in the M-Open Framework.” In Bayesian Theory and Applications.

Cozman. 2000. “Credal Networks.” Artificial Intelligence.

De Bock. 2020. “Archimedean Choice Functions.” Information Processing and Management of Uncertainty in Knowledge-Based Systems.

Giustinelli, Manski, and Molinari. 2021. “Precise or Imprecise Probabilities? Evidence from Survey Response Related to Late-Onset Dementia.” Journal of the European Economic Association.

Haddouche, and Guedj. 2022. “Online PAC-Bayes Learning.”

Hayashi. 2021. “Collective Decision Under Ignorance.” Social Choice and Welfare.

Jansen. 2013. “Robust Bayesian Inference Under Model Misspeciﬁcation.”

Kelter. 2021. “Bayesian Model Selection in the M-Open Setting — Approximate Posterior Inference and Subsampling for Efficient Large-Scale Leave-One-Out Cross-Validation via the Difference Estimator.” Journal of Mathematical Psychology.

Le, and Clarke. 2017. “A Bayes Interpretation of Stacking for M-Complete and M-Open Settings.” Bayesian Analysis.

Masegosa. 2020. “Learning Under Model Misspecification: Applications to Variational and Ensemble Methods.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.

McAllester. 1998. “Some PAC-Bayesian Theorems.” In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. COLT’ 98.

———. 1999. “PAC-Bayesian Model Averaging.” In Proceedings of the Twelfth Annual Conference on Computational Learning Theory.

Rivasplata, Kuzborskij, Szepesvari, et al. 2020. “PAC-Bayes Analysis Beyond the Usual Bounds.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.

Rodríguez-Gálvez, Thobaben, and Skoglund. 2024. “More PAC-Bayes Bounds: From Bounded Losses, to Losses with General Tail Behaviors, to Anytime Validity.” Journal of Machine Learning Research.

Shirvaikar, Walker, and Holmes. 2024. “A General Framework for Probabilistic Model Uncertainty.”

Sucker, and Ochs. 2023. “PAC-Bayesian Learning of Optimization Algorithms.” In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics.

Thiemann, Igel, Wintenberger, et al. 2017. “A Strongly Quasiconvex PAC-Bayesian Bound.” In Proceedings of the 28th International Conference on Algorithmic Learning Theory.

Walley. 1991. Statistical Reasoning with Imprecise Probabilities.