(Kernelized) Stein variational gradient descent

KSVD, SVGD



Stein’s method meets variational inference via kernels and probability measures. The result is method of inference which maintains an ensemble of particles which notionally collectively sample from some target distribution. I should learn about this, as one of the methods I might use for low-assumption Bayes inference.

Let us examine the computable kernelized Stein discrepancy, invented in Q. Liu, Lee, and Jordan (2016), weaponized in Q. Liu, Lee, and Jordan (2016) and summarised in Xu and Matsuda (2021):

Let \(q\) be a smooth probability density on \(\mathbb{R}^{d} .\) For a smooth function \(\mathbf{f}=\) \(\left(f_{1}, \ldots, f_{d}\right): \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}\), the Stein operator \(\mathcal{T}_{q}\) is defined by \[ \mathcal{T}_{q} \mathbf{f}(x)=\sum_{i=1}^{d}\left(f_{i}(x) \frac{\partial}{\partial x^{i}} \log q(x)+\frac{\partial}{\partial x^{i}} f_{i}(x)\right) \]

…Let \(\mathcal{H}\) be a reproducing kernel Hilbert space \((\mathrm{RKHS})\) on \(\mathbb{R}^{d}\) and \(\mathcal{H}^{d}\) be its product. By using Stein operator, kernel Stein discrepancy (KSD) (Gorham and Mackey 2015; Ley, Reinert, and Swan 2017) between two densities \(p\) and \(q\) is defined as \[ \operatorname{KSD}(p \| q)=\sup _{\|\mathbf{f}\|_{\mathcal{H}} \leq 1} \mathbb{E}_{p}\left[\mathcal{T}_{q} \mathbf{f}\right] \] It is shown that \(\operatorname{KSD}(p \| q) \geq 0\) and \(\mathrm{KSD}(p \| q)=0\) if and only if \(p=q\) under mild regularity conditions (Chwialkowski, Strathmann, and Gretton 2016). Thus, KSD is a proper discrepancy measure between densities. After some calculation, \(\operatorname{KSD}(p \| q)\) is rewritten as \[ \operatorname{KSD}^{2}(p \| q)=\mathbb{E}_{x, \tilde{x} \sim p}\left[h_{q}(x, \tilde{x})\right] \] where \(h_{q}\) does not involve \(p\).

TBD.

Moment matching interpretation

References

Alsup, Terrence, Luca Venturi, and Benjamin Peherstorfer. 2022. β€œMultilevel Stein Variational Gradient Descent with Applications to Bayesian Inverse Problems.” In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, 93–117. PMLR.
Ambrogioni, Luca, Umut GΓΌΓ§lΓΌ, Yagmur GΓΌΓ§lΓΌtΓΌrk, Max Hinne, Eric Maris, and Marcel A. J. van Gerven. 2018. β€œWasserstein Variational Inference.” In Proceedings of the 32Nd International Conference on Neural Information Processing Systems, 2478–87. NIPS’18. USA: Curran Associates Inc.
Anastasiou, Andreas, Alessandro Barp, FranΓ§ois-Xavier Briol, Bruno Ebner, Robert E. Gaunt, Fatemeh Ghaderinezhad, Jackson Gorham, et al. 2022. β€œStein’s Method Meets Computational Statistics: A Review of Some Recent Developments.” arXiv.
Chen, Peng, Keyi Wu, Joshua Chen, Thomas O’Leary-Roseberry, and Omar Ghattas. 2020. β€œProjected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions.” arXiv.
Chu, Casey, Kentaro Minami, and Kenji Fukumizu. 2022. β€œThe Equivalence Between Stein Variational Gradient Descent and Black-Box Variational Inference.” In, 5.
Chwialkowski, Kacper, Heiko Strathmann, and Arthur Gretton. 2016. β€œA Kernel Test of Goodness of Fit.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 2606–15. ICML’16. New York, NY, USA: JMLR.org.
Detommaso, Gianluca, Tiangang Cui, Alessio Spantini, Youssef Marzouk, and Robert Scheichl. 2018. β€œA Stein Variational Newton Method.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 9187–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Detommaso, Gianluca, Hanne Hoitzing, Tiangang Cui, and Ardavan Alamir. 2019. β€œStein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural Networks.” arXiv:1901.07987 [Cs, Stat], May.
Feng, Yihao, Dilin Wang, and Qiang Liu. 2017. β€œLearning to Draw Samples with Amortized Stein Variational Gradient Descent.” In UAI 2017. arXiv.
Gong, Chengyue, Jian Peng, and Qiang Liu. 2019. β€œQuantile Stein Variational Gradient Descent for Batch Bayesian Optimization.” In Proceedings of the 36th International Conference on Machine Learning, 2347–56. PMLR.
Gorham, Jackson, and Lester Mackey. 2015. β€œMeasuring Sample Quality with Stein’s Method.” In Advances in Neural Information Processing Systems. Vol. 28.
Gorham, Jackson, Anant Raj, and Lester Mackey. 2020. β€œStochastic Stein Discrepancies.” arXiv:2007.02857 [Cs, Math, Stat], October.
Han, Jun, and Qiang Liu. 2018. β€œStein Variational Gradient Descent Without Gradient.” In Proceedings of the 35th International Conference on Machine Learning, 1900–1908. PMLR.
Huggins, Jonathan H., Trevor Campbell, MikoΕ‚aj Kasprzak, and Tamara Broderick. 2018. β€œScalable Gaussian Process Inference with Finite-Data Mean and Variance Guarantees.” arXiv:1806.10234 [Cs, Stat], June.
Ley, Christophe, Gesine Reinert, and Yvik Swan. 2017. β€œStein’s Method for Comparison of Univariate Distributions.” Probability Surveys 14 (none): 1–52.
Liu, Chang, and Jun Zhu. 2018. β€œRiemannian Stein Variational Gradient Descent for Bayesian Inference.” Proceedings of the AAAI Conference on Artificial Intelligence 32 (1).
Liu, Qiang. 2016. β€œStein Variational Gradient Descent: Theory and Applications,” 6.
β€”β€”β€”. 2017. β€œStein Variational Gradient Descent as Gradient Flow.” arXiv.
Liu, Qiang, Jason D Lee, and Michael Jordan. 2016. β€œA Kernelized Stein Discrepancy for Goodness-of-Fit Tests.” In Proceedings of The 33rd International Conference on Machine Learning, 9.
Liu, Qiang, and Dilin Wang. 2018. β€œStein Variational Gradient Descent as Moment Matching.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 31:8868–77. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
β€”β€”β€”. 2019. β€œStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In Advances In Neural Information Processing Systems.
Liu, Xing, Harrison Zhu, Jean-Francois Ton, George Wynne, and Andrew Duncan. 2022. β€œGrassmann Stein Variational Gradient Descent.” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, 2002–21. PMLR.
Pielok, Tobias, Bernd Bischl, and David RΓΌgamer. 2023. β€œApproximate Bayesian Inference with Stein Functional Variational Gradient Descent.” In.
Pulido, Manuel, and Peter Jan van Leeuwen. 2019. β€œSequential Monte Carlo with Kernel Embedded Mappings: The Mapping Particle Filter.” Journal of Computational Physics 396 (November): 400–415.
Pulido, Manuel, Peter Jan Van Leeuwen, and Derek J. Posselt. 2019. β€œKernel Embedded Nonlinear Observational Mappings in the Variational Mapping Particle Filter.” In Computational Science – ICCS 2019. ICCS 2019. Lecture Notes in Computer Science, edited by Joao M. F. Rodrigues, Pedro J. S. Cardoso, Janio Monteiro, Roberto Lam, Valeria V. Krzhizhanovskaya, Michael H. Lees, Jack J. Dongarra, and Peter M. A. Sloot, 141–55. Faro, Portugal: Springer.
Stordal, Andreas S., Rafael J. Moraes, Patrick N. Raanes, and Geir Evensen. 2021. β€œP-Kernel Stein Variational Gradient Descent for Data Assimilation and History Matching.” Mathematical Geosciences 53 (3): 375–93.
Tamang, Sagar K., Ardeshir Ebtehaj, Peter J. van Leeuwen, Dongmian Zou, and Gilad Lerman. 2021. β€œEnsemble Riemannian Data Assimilation over the Wasserstein Space.” Nonlinear Processes in Geophysics 28 (3): 295–309.
Wang, Dilin, and Qiang Liu. 2019. β€œNonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models.” In Proceedings of the 36th International Conference on Machine Learning, 6576–85. PMLR.
Wang, Dilin, Ziang Tang, Chandrajit Bajaj, and Qiang Liu. 2019. β€œStein Variational Gradient Descent with Matrix-Valued Kernels.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 7836–46. Red Hook, NY, USA: Curran Associates Inc.
Wang, Dilin, Zhe Zeng, and Qiang Liu. 2018. β€œStein Variational Message Passing for Continuous Graphical Models.” arXiv.
Wang, Ziyu, Tongzheng Ren, Jun Zhu, and Bo Zhang. 2018. β€œFunction Space Particle Optimization for Bayesian Neural Networks.” In.
Wen, Linjie, and Jinglai Li. 2022. β€œAffine-Mapping Based Variational Ensemble Kalman Filter.” Statistics and Computing 32 (6): 97.
Xu, Wenkai, and Takeru Matsuda. 2021. β€œInterpretable Stein Goodness-of-Fit Tests on Riemannian Manifolds.” arXiv:2103.00895 [Stat], March.
Zhang, Jianyi, Ruiyi Zhang, Lawrence Carin, and Changyou Chen. 2020. β€œStochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory.” In International Conference on Artificial Intelligence and Statistics, 1877–87. PMLR.
Zhuo, Jingwei, Chang Liu, Jiaxin Shi, Jun Zhu, Ning Chen, and Bo Zhang. 2018. β€œMessage Passing Stein Variational Gradient Descent.” In Proceedings of the 35th International Conference on Machine Learning, 6018–27. PMLR.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.