# (Kernelized) Stein variational gradient descent

## KSVD, SVGD Stein’s method meets variational inference via kernels and probability measures. The result is method of inference which maintains an ensemble of particles which notionally collectively sample from some target distribution. I should learn about this, as one of the methods I might use for low-assumption Bayes inference.

Let us examine the computable kernelized Stein discrepancy, invented in Q. Liu, Lee, and Jordan (2016), weaponized in Q. Liu, Lee, and Jordan (2016) and summarised in Xu and Matsuda (2021):

Let $$q$$ be a smooth probability density on $$\mathbb{R}^{d} .$$ For a smooth function $$\mathbf{f}=$$ $$\left(f_{1}, \ldots, f_{d}\right): \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}$$, the Stein operator $$\mathcal{T}_{q}$$ is defined by $\mathcal{T}_{q} \mathbf{f}(x)=\sum_{i=1}^{d}\left(f_{i}(x) \frac{\partial}{\partial x^{i}} \log q(x)+\frac{\partial}{\partial x^{i}} f_{i}(x)\right)$

…Let $$\mathcal{H}$$ be a reproducing kernel Hilbert space $$(\mathrm{RKHS})$$ on $$\mathbb{R}^{d}$$ and $$\mathcal{H}^{d}$$ be its product. By using Stein operator, kernel Stein discrepancy (KSD) between two densities $$p$$ and $$q$$ is defined as $\operatorname{KSD}(p \| q)=\sup _{\|\mathbf{f}\|_{\mathcal{H}} \leq 1} \mathbb{E}_{p}\left[\mathcal{T}_{q} \mathbf{f}\right]$ It is shown that $$\operatorname{KSD}(p \| q) \geq 0$$ and $$\mathrm{KSD}(p \| q)=0$$ if and only if $$p=q$$ under mild regularity conditions . Thus, KSD is a proper discrepancy measure between densities. After some calculation, $$\operatorname{KSD}(p \| q)$$ is rewritten as $\operatorname{KSD}^{2}(p \| q)=\mathbb{E}_{x, \tilde{x} \sim p}\left[h_{q}(x, \tilde{x})\right]$ where $$h_{q}$$ does not involve $$p$$.

TBD.

## Moment matching interpretation

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.