Let’s do some Bayesian inference! We have a parameter $θ$ and some (i.i.d for now) data $X = {x_{1}, x_{2}, \dots, x_{n}}$ . Suppose we have a prior $π (θ)$ and a likelihood $p (x | θ)$ ; for now, we’ll assume it has a density. The update from prior to posterior is given by Bayes’ theorem: $\begin{matrix} (1) & π_{n} (θ) \propto π (θ) \prod_{i = 1}^{n} p (x_{i} | θ) \end{matrix}$ or in the log domain $\log π_{n} (θ) = \log π (θ) + \sum_{i = 1}^{n} \log p (x_{i} | θ) + \log marginal likelihood .$ In a Gibbs posterior approach, we decide the likelihood isn’t quite cutting the mustard but still want to get something like a Bayesian posterior in a more general setting. We do two things:

Replace the likelihood $\log p (x_{i} | θ)$ , specifically the negative log-likelihood $- \log p (x_{i} | θ)$ , with a different loss function $ℓ (θ, x_{i})$ .
Introduce a learning rate factor $ω$ that lets us control how much we trust the loss function relative to the prior.

Not the same as a Gibbs sampler or a Gibbs distribution, a Gibbs posterior is a way of doing Bayesian inference that uses a loss function instead of a likelihood. That said, you can, confusingly, use Gibbs samplers to sample from Gibbs posteriors.

How does that look? $\begin{matrix} (2) & \begin{aligned} π_{n} (θ) & \propto \exp {- ω \sum_{i}^{n} ℓ (θ, x_{i})} π (θ) \\ = \exp {- ω R_{n} (θ)} π (θ), \end{aligned} \end{matrix}$ where $R_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} ℓ (θ, x_{i})$ is simply the empirical risk (sometimes we put a factor of $1 / n$ in front of the sum to make it an average).

$ω$ is a tempering/temperature factor.

We seem to have given up the likelihood principle since the empirical risk is estimated directly rather than from an integral of a cost over a posterior prediction decision, but maybe this is okay if we aren’t sure about the likelihood anyway.

N. A. Syring (2018) is a thesis-length introduction. There is a compact explanation in Martin and Syring (2022).

Note that the Gibbs posterior becomes the same as the classical Bayesian posterior when we choose the loss function to be the negative log-likelihood and set the learning rate to 1. This means that instead of updating beliefs via the standard likelihood, we use a loss function that—for this choice—exactly recovers the usual Bayesian update.

1 A worked example

Would be a good idea. Note that Equation 2 includes a sneaky implicit integral, just like normal Bayes. The solution is an integral over the parameter space. It’s easy to say “the solution is just the function that satisfies so-and-so” but calculating it can be tricky. I’m not sure when it would be harder or easier than classical Bayes in practice.

2 Theoretical guarantees

Not sure, but see (Martin and Syring 2022; N. Syring and Martin 2023; Luo et al. 2023).

2.1 Asymptotics

(Alquier, Ridgway, and Chopin 2016; Buhmann et al. 2018; Miller 2021; N. Syring and Martin 2023; Winter, Melikechi, and Dunson 2023)

3 As a robust Bayesian method

So it seems, although the literature of Gibbs posteriors looks quite different from the robust Bayes literature I’m used to.

4 Generalized variational inference

Gibbs posteriors seem related to so-called Generalized Variational Inference (Bissiri, Holmes, and Walker 2016). The use of a loss function instead of a likelihood sounds like a shared property.

See Generalized Variational Inference.

5 Energy-based models

Connection to Energy-based models expanded in Andy Jones’ Gibbs posteriors and energy-based models.

6 Incoming

Andy Jones
- Gibbs posteriors
- Gibbs posteriors and energy-based models

7 References

Alquier, Ridgway, and Chopin. 2016. “On the Properties of Variational Approximations of Gibbs Posteriors.” Journal of Machine Learning Research.

Baek, Aquino, and Mukherjee. 2023. “Generalized Bayes Approach to Inverse Problems with Model Misspecification.” Inverse Problems.

Bhattacharya, and Martin. 2022. “Gibbs Posterior Inference on Multivariate Quantiles.” Journal of Statistical Planning and Inference.

Bissiri, Holmes, and Walker. 2016. “A General Framework for Updating Belief Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Bochkina. 2023. “Bernstein–von Mises Theorem and Misspecified Models: A Review.” In Foundations of Modern Statistics. Springer Proceedings in Mathematics & Statistics.

Buhmann, Dumazert, Gronskiy, et al. 2018. “Posterior Agreement for Large Parameter-Rich Optimization Problems.” Theoretical Computer Science.

Catoni. 2007. “PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning.” IMS Lecture Notes Monograph Series.

Dellaporta, Knoblauch, Damoulas, et al. 2022. “Robust Bayesian Inference for Simulator-Based Models via the MMD Posterior Bootstrap.” arXiv:2202.04744 [Cs, Stat].

Grendár, and Judge. 2012. “Not All Empirical Divergence Minimizing Statistical Methods Are Created Equal?” AIP Conference Proceedings.

Grünwald. 2023. “The e-Posterior.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

Knoblauch, Jewson, and Damoulas. 2019. “Generalized Variational Inference: Three Arguments for Deriving New Posteriors.”

———. 2022. “An Optimization-Centric View on Bayes’ Rule: Reviewing and Generalizing Variational Inference.” Journal of Machine Learning Research.

Li, Jiang, and Tanner. 2014a. “General Inequalities for Gibbs Posterior with Nonadditive Empirical Risk.” Econometric Theory.

———. 2014b. “General Oracle Inequalities for Gibbs Posterior with Application to Ranking.” Econometric Theory.

Luo, Stephens, Graham, et al. 2023. “Assessing the Validity of Bayesian Inference Using Loss Functions.”

Martin, and Syring. 2022. “Direct Gibbs Posterior Inference on Risk Minimizers: Construction, Concentration, and Calibration.”

Masegosa. 2020. “Learning Under Model Misspecification: Applications to Variational and Ensemble Methods.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.

Matsubara, Knoblauch, Briol, et al. 2022. “Robust Generalised Bayesian Inference for Intractable Likelihoods.” Journal of the Royal Statistical Society Series B: Statistical Methodology.

McGoff, Mukherjee, and Nobel. 2022. “Gibbs Posterior Convergence and the Thermodynamic Formalism.” The Annals of Applied Probability.

Miller. 2021. “Asymptotic Normality, Concentration, and Coverage of Generalized Posteriors.” Journal of Machine Learning Research.

Schmon, Cannon, and Knoblauch. 2021. “Generalized Posteriors in Approximate Bayesian Computation.” arXiv:2011.08644 [Stat].

Syring, Nicholas Aaron. 2018. “Gibbs Posterior Distributions: New Theory and Applications.”

Syring, Nicholas, and Martin. 2020. “Robust and Rate-Optimal Gibbs Posterior Inference on the Boundary of a Noisy Image.” The Annals of Statistics.

———. 2023. “Gibbs Posterior Concentration Rates Under Sub-Exponential Type Losses.” Bernoulli.

Walker. 2013. “Bayesian Inference with Misspecified Models.” Journal of Statistical Planning and Inference.

Wang, Yixin, and Blei. 2019. “Variational Bayes Under Model Misspecification.” In Advances in Neural Information Processing Systems.

Wang, Zhe, and Martin. 2021. “Gibbs Posterior Inference on a Levy Density Under Discrete Sampling.”

Watson, and Holmes. 2016. “Approximate Models and Robust Decisions.” Statistical Science.

Winter, Melikechi, and Dunson. 2023. “Sequential Gibbs Posteriors with Applications to Principal Component Analysis.”