Inference without KL divergence

2019-10-03 — 2022-04-28

Wherein alternative divergences to KL are examined, and a probability-functional descent using von Mises calculus is presented for distributional and likelihood-free Bayesian inference, with algorithmic links to SGD.

approximation

Bayes

how do science

measure

metrics

Monte Carlo

probability

statistics

Placeholder. Various links on inference by minimising some other divergence than the Kullback-Leibler divergence.

As mentioned in likelihood-free inference, this is especially interesting in the case of Bayesian inference, or more generally, distributional inference, complications ensue.

This notebook is a candidate for merging with generalized Bayesian computation.

(Chu, Blanchet, and Glynn 2019):

in many fields, the object of interest is a probability distribution; moreover, the learning process is guided by a probability functional to be minimized, a loss function that conceptually maps a probability distribution to a real number […] Because the optimization now takes place in the infinite-dimensional space of probability measures, standard finite-dimensional algorithms like gradient descent are initially unavailable; even the proper notion for the derivative of these functionals is unclear. We call upon a body of literature known as von Mises calculus, originally developed in the field of asymptotic statistics, to make these functional derivatives precise. Remarkably, we find that once the connection is made, the resulting generalized descent algorithm, which we call probability functional descent, is intimately compatible with standard deep learning techniques such as stochastic gradient descent, the reparameterization trick, and adversarial training.

1 References

Alquier, and Gerber. 2024. “Universal Robust Regression via Maximum Mean Discrepancy.” Biometrika.

Ambrogioni, Güçlü, Güçlütürk, et al. 2018. “Wasserstein Variational Inference.” In Proceedings of the 32Nd International Conference on Neural Information Processing Systems. NIPS’18.

Ambrogioni, Guclu, and van Gerven. 2019. “Wasserstein Variational Gradient Descent: From Semi-Discrete Optimal Transport to Ensemble Variational Inference.”

Arjovsky, Chintala, and Bottou. 2017. “Wasserstein Generative Adversarial Networks.” In International Conference on Machine Learning.

Bach. 2023. “Information Theory With Kernel Methods.” IEEE Transactions on Information Theory.

Beran. 1977. “Minimum Hellinger Distance Estimates for Parametric Models.” The Annals of Statistics.

Bissiri, Holmes, and Walker. 2016. “A General Framework for Updating Belief Distributions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Blanchet, Kang, and Murthy. 2016. “Robust Wasserstein Profile Inference and Applications to Machine Learning.” arXiv:1610.05627 [Math, Stat].

Blanchet, Kang, Zhang, et al. 2017. “Data-Driven Optimal Cost Selection for Distributionally Robust Optimization.” arXiv:1705.07152 [Stat].

Blanchet, Murthy, and Zhang. 2018. “Optimal Transport Based Distributionally Robust Optimization: Structural Properties and Iterative Schemes.” arXiv:1810.02403 [Math].

Block, Hoffman, Raabe, et al. 2020. “Social Network-Based Distancing Strategies to Flatten the COVID 19 Curve in a Post-Lockdown World.” arXiv:2004.07052 [Physics, q-Bio, Stat].

Campbell, and Broderick. 2017. “Automated Scalable Bayesian Inference via Hilbert Coresets.” arXiv:1710.05053 [Cs, Stat].

Chen, Dai, and Song. 2019. “Meta Particle Flow for Sequential Bayesian Inference.” arXiv:1902.00640 [Cs, Stat].

Cherief-Abdellatif, and Alquier. 2020. “MMD-Bayes: Robust Bayesian Estimation via Maximum Mean Discrepancy.” In Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference.

Chu, Blanchet, and Glynn. 2019. “Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning.” In ICML.

Dellaporta, Knoblauch, Damoulas, et al. 2022. “Robust Bayesian Inference for Simulator-Based Models via the MMD Posterior Bootstrap.” arXiv:2202.04744 [Cs, Stat].

Fernholz. 1983. von Mises calculus for statistical functionals. Lecture Notes in Statistics 19.

———. 2014. “Statistical Functionals.” In Wiley StatsRef: Statistics Reference Online.

Fong, Lyddon, and Holmes. 2019. “Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap.” arXiv:1902.03175 [Cs, Stat].

Frogner, Zhang, Mobahi, et al. 2015. “Learning with a Wasserstein Loss.” In Advances in Neural Information Processing Systems 28.

Gao, and Kleywegt. 2022. “Distributionally Robust Stochastic Optimization with Wasserstein Distance.”

Gibbs, and Su. 2002. “On Choosing and Bounding Probability Metrics.” International Statistical Review.

Grendár, and Judge. 2012. “Not All Empirical Divergence Minimizing Statistical Methods Are Created Equal?” AIP Conference Proceedings.

Gulrajani, Ahmed, Arjovsky, et al. 2017. “Improved Training of Wasserstein GANs.” arXiv:1704.00028 [Cs, Stat].

Guo, Hong, Lin, et al. 2017. “Relaxed Wasserstein with Applications to GANs.” arXiv:1705.07164 [Cs, Stat].

Jewson, Smith, and Holmes. 2018. “Principles of Bayesian Inference Using General Divergence Criteria.” Entropy.

Knoblauch, Jewson, and Damoulas. 2019. “Generalized Variational Inference: Three Arguments for Deriving New Posteriors.”

———. 2022. “An Optimization-Centric View on Bayes’ Rule: Reviewing and Generalizing Variational Inference.” Journal of Machine Learning Research.

Liu, Huidong, Gu, and Samaras. 2018. “A Two-Step Computation of the Exact GAN Wasserstein Distance.” In International Conference on Machine Learning.

Liu, Qiang, Lee, and Jordan. 2016. “A Kernelized Stein Discrepancy for Goodness-of-Fit Tests.” In Proceedings of The 33rd International Conference on Machine Learning.

Lyddon, Walker, and Holmes. 2018. “Nonparametric Learning from Bayesian Models with Randomized Objective Functions.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.

Mahdian, Blanchet, and Glynn. 2019. “Optimal Transport Relaxations with Application to Wasserstein GANs.” arXiv:1906.03317 [Cs, Math, Stat].

Matsubara, Knoblauch, Briol, et al. 2022. “Robust Generalised Bayesian Inference for Intractable Likelihoods.” Journal of the Royal Statistical Society Series B: Statistical Methodology.

Moosmüller, and Cloninger. 2021. “Linear Optimal Transport Embedding: Provable Wasserstein Classification for Certain Rigid Transformations and Perturbations.” arXiv:2008.09165 [Cs, Math, Stat].

Nott, Drovandi, and Frazier. 2023. “Bayesian Inference for Misspecified Generative Models.”

Ostrovski, Dabney, and Munos. n.d. “Autoregressive Quantile Networks for Generative Modeling.”

Pacchiardi, and Dutta. 2022. “Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators.” arXiv:2104.03889 [Stat].

Panaretos, and Zemel. 2019. “Statistical Aspects of Wasserstein Distances.” Annual Review of Statistics and Its Application.

Ranganath, Tran, Altosaar, et al. 2016. “Operator Variational Inference.” In Advances in Neural Information Processing Systems 29.

Rustamov. 2021. “Closed-Form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto-Encoders.” Stat.

Santambrogio. 2015. Optimal Transport for Applied Mathematicians. Edited by Filippo Santambrogio. Progress in Nonlinear Differential Equations and Their Applications.

Schmon, Cannon, and Knoblauch. 2021. “Generalized Posteriors in Approximate Bayesian Computation.” arXiv:2011.08644 [Stat].

Solomon, de Goes, Peyré, et al. 2015. “Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains.” ACM Transactions on Graphics.

Tiao, Bonilla, and Ramos. 2018. “Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference.”

Wang, Yifei, Chen, and Li. 2021. “Projected Wasserstein Gradient Descent for High-Dimensional Bayesian Inference.”

Wang, Prince Zizhuang, and Wang. 2019. “Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).

Zhang, Walder, Bonilla, et al. 2020. “Quantile Propagation for Wasserstein-Approximate Gaussian Processes.” In Proceedings of NeurIPS 2020.