Mixtures where the target is the predictor-conditional posterior density, by likelihood weighting of each sub-model. Non-likelihood approaches are referenced in model averaging or neural mixtures.
Bayesian Inference with Mixture Priors
When dealing with Bayesian inference where the prior is a mixture density, the resulting posterior distribution will also generally be a mixture density.
Assume the prior for is a mixture of densities with mixture weights , where :
Using Bayes’ theorem, the posterior distribution of given data is:
Substituting the mixture prior into Bayes’ theorem gives:
The numerator of the posterior can be expanded as:
The marginal likelihood is computed using the law of total probability:
where is defined as:
The final form of the posterior is:
This can be further simplified to a mixture form:
where the posterior weights are:
and is the component-specific posterior for , updated based on the -th component of the prior:
The posterior distribution is a mixture of the component-specific posteriors , with each component weighted by . These weights are updated based on the explanatory power of each component regarding the observed data , adjusted by the original prior weights .
In Bayesian inference, using a mixture prior leads to a posterior that is also a mixture, effectively combining different models or beliefs about the parameters, each updated according to its relative contribution to explaining the new data.
Under mis-specification
See M-open for a discussion of the M-open setting.
References
Barron, Rissanen, and Yu. 1998.
“The Minimum Description Length Principle in Coding and Modeling.” IEEE Transactions on Information Theory.
Battey, and Sancetta. 2013.
“Conditional Estimation for Dependent Functional Data.” Journal of Multivariate Analysis.
Bernardo, and Smith. 2000. Bayesian Theory.
Bishop, and Svensen. 2012.
“Bayesian Hierarchical Mixtures of Experts.” arXiv:1212.2447 [Cs, Stat].
Boyd, Hastie, Boyd, et al. 2016.
“Saturating Splines and Feature Selection.” arXiv:1609.06764 [Stat].
Chamroukhi, Pham, Hoang, et al. 2024.
“Functional Mixtures-of-Experts.” Statistics and Computing.
Claeskens, and Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics.
Clyde, and Iversen. 2013.
“Bayesian Model Averaging in the M-Open Framework.” In
Bayesian Theory and Applications.
Dempster. 1973. “Alternatives to Least Squares in Multiple Regression.” Multivariate Statistical Inference.
Draper. 1995.
“Assessment and Propagation of Model Uncertainty.” Journal of the Royal Statistical Society: Series B (Methodological).
Eilers, and Marx. 1996.
“Flexible Smoothing with B-Splines and Penalties.” Statistical Science.
Hansen. 2007.
“Least Squares Model Averaging.” Econometrica.
Hinton, Vinyals, and Dean. 2015.
“Distilling the Knowledge in a Neural Network.” arXiv:1503.02531 [Cs, Stat].
Hjort, and Claeskens. 2003.
“Frequentist Model Average Estimators.” Journal of the American Statistical Association.
Hoeting, Madigan, Raftery, et al. 1999.
“Bayesian Model Averaging: A Tutorial.” Statistical Science.
Hurn, Justel, and Robert. 2003.
“Estimating Mixtures of Regressions.” Journal of Computational and Graphical Statistics.
Marin, Mengersen, and Robert. 2005.
“Bayesian Modelling and Inference on Mixtures of Distributions.” In
Handbook of Statistics.
Masegosa. 2020.
“Learning Under Model Misspecification: Applications to Variational and Ensemble Methods.” In
Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.
Nott, Tan, Villani, et al. 2012.
“Regression Density Estimation With Variational Methods and Stochastic Approximation.” Journal of Computational and Graphical Statistics.
Raftery, and Zheng. 2003.
“Discussion: Performance of Bayesian Model Averaging.” Journal of the American Statistical Association.
Tan, and Nott. 2014.
“Variational Approximation for Mixtures of Linear Mixed Models.” Journal of Computational and Graphical Statistics.
Viele, and Tong. 2002.
“Modeling with Mixtures of Linear Regressions.” Statistics and Computing.
Wang, Zhang, and Zou. 2009.
“Frequentist Model Averaging Estimation: A Review.” Journal of Systems Science and Complexity.
Waterhouse, MacKay, and Robinson. 1995.
“Bayesian Methods for Mixtures of Experts.” In
Advances in Neural Information Processing Systems.
Zhang, Tianfang, Bokrantz, and Olsson. 2021.
“A Similarity-Based Bayesian Mixture-of-Experts Model.” arXiv:2012.02130 [Cs, Stat].