# (Kernelized) Stein variational gradient descent

KSVD, SVGD

November 2, 2022 — January 9, 2023

Stein’s method meets variational inference via kernels and probability measures. The result is method of inference which maintains an ensemble of particles which notionally collectively sample from some target distribution. I should learn about this, as one of the methods I might use for low-assumption Bayes inference.

Let us examine the computable *kernelized* Stein discrepancy, invented in Q. Liu, Lee, and Jordan (2016), weaponized in Q. Liu, Lee, and Jordan (2016) and summarised in Xu and Matsuda (2021):

Let \(q\) be a smooth probability density on \(\mathbb{R}^{d} .\) For a smooth function \(\mathbf{f}=\) \(\left(f_{1}, \ldots, f_{d}\right): \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}\), the Stein operator \(\mathcal{T}_{q}\) is defined by \[ \mathcal{T}_{q} \mathbf{f}(x)=\sum_{i=1}^{d}\left(f_{i}(x) \frac{\partial}{\partial x^{i}} \log q(x)+\frac{\partial}{\partial x^{i}} f_{i}(x)\right) \]

…Let \(\mathcal{H}\) be a reproducing kernel Hilbert space \((\mathrm{RKHS})\) on \(\mathbb{R}^{d}\) and \(\mathcal{H}^{d}\) be its product. By using Stein operator, kernel Stein discrepancy (KSD) (Gorham and Mackey 2015; Ley, Reinert, and Swan 2017) between two densities \(p\) and \(q\) is defined as \[ \operatorname{KSD}(p \| q)=\sup _{\|\mathbf{f}\|_{\mathcal{H}} \leq 1} \mathbb{E}_{p}\left[\mathcal{T}_{q} \mathbf{f}\right] \] It is shown that \(\operatorname{KSD}(p \| q) \geq 0\) and \(\mathrm{KSD}(p \| q)=0\) if and only if \(p=q\) under mild regularity conditions (Chwialkowski, Strathmann, and Gretton 2016). Thus, KSD is a proper discrepancy measure between densities. After some calculation, \(\operatorname{KSD}(p \| q)\) is rewritten as \[ \operatorname{KSD}^{2}(p \| q)=\mathbb{E}_{x, \tilde{x} \sim p}\left[h_{q}(x, \tilde{x})\right] \] where \(h_{q}\) does not involve \(p\).

TBD.

## 1 Moment matching interpretation

## 2 Incoming

## 3 References

*Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference*.

*Proceedings of the 32Nd International Conference on Neural Information Processing Systems*. NIPS’18.

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*. ICML’16.

*Proceedings of the 32nd International Conference on Neural Information Processing Systems*. NIPS’18.

*arXiv:1901.07987 [Cs, Stat]*.

*UAI 2017*.

*Proceedings of the 36th International Conference on Machine Learning*.

*Advances in Neural Information Processing Systems*.

*arXiv:2007.02857 [Cs, Math, Stat]*.

*Proceedings of the 35th International Conference on Machine Learning*.

*arXiv:1806.10234 [Cs, Stat]*.

*Probability Surveys*.

*Proceedings of The 33rd International Conference on Machine Learning*.

*Proceedings of the 32nd International Conference on Neural Information Processing Systems*. NIPS’18.

*Advances In Neural Information Processing Systems*.

*Proceedings of the AAAI Conference on Artificial Intelligence*.

*Proceedings of The 25th International Conference on Artificial Intelligence and Statistics*.

*Journal of Computational Physics*.

*Computational Science – ICCS 2019. ICCS 2019. Lecture Notes in Computer Science*.

*Mathematical Geosciences*.

*Nonlinear Processes in Geophysics*.

*Proceedings of the 36th International Conference on Machine Learning*.

*Proceedings of the 33rd International Conference on Neural Information Processing Systems*.

*Statistics and Computing*.

*arXiv:2103.00895 [Stat]*.

*International Conference on Artificial Intelligence and Statistics*.

*Proceedings of the 35th International Conference on Machine Learning*.