Imprecise Bayesianism
2025-11-30 — 2026-01-14
Wherein imprecise Bayesianism is presented as an alternative for M‑open problems, where beliefs are represented by convex sets of distributions and PAC‑Bayes generalization bounds are invoked.
In M-open Bayesian inference, we accept that our models are simplifications that don’t contain the true data-generating process, which leads to problems with standard Bayesian updating.
What alternative foundations or extensions of Bayesianism can better handle model misspecification?
1 Maximin updates over set priors
As far as I know, this is the classic approach. Informally: we have a set of prior distributions representing our beliefs. When we see data, we update each prior using Bayes’ rule to get a set of posteriors. When making decisions, we consider the worst-case expected utility across all posteriors in this set and choose the action that maximizes this worst-case utility.
Easy to say, but I haven’t really used this myself and suspect it’s annoying in practice (Camerer and Weber 1992; De Bock 2020; Giustinelli, Manski, and Molinari 2021; Hayashi 2021; Walley 1991).
2 PAC-Bayes methods
There’s a large body of work on this; I’m not an expert (Catoni 2007; Haddouche and Guedj 2022; Rivasplata et al. 2020; Rodríguez-Gálvez, Thobaben, and Skoglund 2024; Sucker and Ochs 2023; Thiemann et al. 2017).
PAC-Bayes (Probably Approximately Correct Bayesian) methods offer a theoretically grounded way to aggregate misspecified models in the M-open setting, giving high-probability generalization bounds without assuming realizability. They originate from (McAllester 1998, 1999) and were sharpened by Catoni (2007). The bounds control the expected risk of a posterior over hypotheses via the KL divergence to a prior, so we can, in principle, do robust model selection or ensembling even when the true data-generating process lies outside the model class \(\mathcal{M}\).
I’m unclear whether all stacking bounds are in fact of PAC-Bayes type.
PAC-Bayes seems to justify techniques like stacking by quantifying how well a data-dependent posterior generalizes: for concreteness, Catoni’s bound states that for bounded losses \(\ell\), \[\mathbb{E}_{Q}[\mathrm{risk}(h)] \leq \frac{1}{n} \sum \ell(h(x_i), y_i) + \sqrt{\frac{\mathrm{KL}(Q\|P) + \log(2n/\delta)}{2n}},\] with probability \(1-\delta\) over data of size \(n\), prior \(P\), and posterior \(Q\). This sidesteps M-closed assumptions in some sense, but don’t ask me for details.
Tools like paccube or pacbayes.py implement optimization of these bounds for neural nets and beyond.
3 Infrabayesianism
Infrabayesianism — I presume it’s different from standard set-prior approaches, but I don’t really know it well enough to say how.
It starts with the same assumption as M-open Bayes: the true state of the world is likely outside an agent’s hypothesis space. They argue this to be especially critical for embedded agents—agents that are part of an environment vastly more complex than they are. An AI can’t model every atom in its server room, let alone the universe, so its world model is necessarily incomplete.
I confess I don’t follow that emphasis myself — I also can’t model every atom in anything I study, and I get by without infrabayesian reasoning. I should re-listen to Vanessa Kosoy on this theme. Infrabayesianism is nonetheless motivated as a framework for the future of AI systems that must navigate a world of deep and unavoidable uncertainty.
Where Bayesianism uses a single probability distribution to represent belief, Infrabayesianism uses infradistributions—which are convex sets of probability distributions.
Instead of saying, “The probability of rain is 40%,” an infrabayesian agent might say, “The probability of rain is somewhere between 30% and 60%.” I don’t know how this differs from the above definitions of imprecise probability, which directly capture the agent’s uncertainty and acknowledge the limitations of its model. This allows for more robust reasoning by considering a range of plausible worlds rather than committing to a single, likely-wrong one.
The framework apparently provides update rules and decision-making procedures (like minimax or upper-expectation reasoning) that are philosophically robust, ensuring the agent doesn’t discard useful information and can handle deep uncertainty. I haven’t used any of these in practice yet, so I’ll refrain from offering too much opinion.
