Especially stochastic automatic differentiation

Taking gradients through integrals using randomness.

Generic

Could do with a decent intro. TBD.

For unifying overviews see and the storchastic docs.

Optimising Monte Carlo

Let us say I need to differentiate through a monte carlo algorithm to alter its parameters while holding the PRNG fixed. See Tuning MC.

Parametric

I can imagine that our observed rv $${\mathsf{x}}\in \mathbb{R}$$ is generated via lookups from its iCDF $$F(\cdot;\theta)$$ with parameter $$\theta$$: $\mathsf{x} = F^{-1}(\mathsf{u};\theta)$ where $$\mathsf{u}\sim\operatorname{Uniform}(0,1)$$. Each realization corresponds to a choice of $$u_i\sim \mathsf{u}$$ independently. How can I get the derivative of such a map?

Maybe I generated my original variable not by the icdf method but by simulating some variable $${\mathsf{z}}\sim F(\cdot; \theta).$$ In which case I may as well have generated those $$\mathsf{u}_i$$ by taking $$\mathsf{u}_i=F(\mathsf{z}_i;\theta)$$ for some $$\mathsf{z} \sim F(\cdot;\theta)$$ and I am conceptually generating my RV by fixing $$z_i\sim\mathsf{z}_i$$ and taking $$\phi := F^{-1}(F(z_i;\theta);\tau).$$ So to find the effect of my perturbation what I actually need is

\begin{aligned} \left.\frac{\partial}{\partial \tau} F^{-1}(F(z;\theta);\tau)\right|_{\tau=\theta}\\ \end{aligned}

Does this do what we want? Kinda. So suppose that the parameters in question are something boring, such as the location parameter of a location-scale distribution, i.e. $$F(\cdot;\theta)=F(\cdot-\theta;0).$$ Then $$F^{-1}(\cdot;\theta)=F^{-1}(\cdot;0)+\theta$$ and thus

\begin{aligned} \left.\frac{\partial}{\partial \tau} F^{-1}(F(z;\theta);\tau)\right|_{\tau=\theta} &=\left.\frac{\partial}{\partial \tau} F^{-1}(F(z-\theta;0);0)+\tau\right|_{\tau=\theta}\\ &=\left.\frac{\partial}{\partial \tau}\left(z-\theta+\tau\right)\right|_{\tau=\theta}\\ &=1\\ \end{aligned}

OK grand that came out simple enough.

TBC

Tooling

van Krieken, Tomczak, and Teije (2021) claims to supply us with a large library of pytorch tools for this purpose, under the rubric Storchastic. (Source.). See also Deepmind’s mc_gradients.

References

Fu, Michael. 2005. 32.
Hyvarinen, Aapo. n.d. “Estimation of Non-Normalized Statistical Models by Score Matching,” 15.
Krieken, Emile van, Jakub M. Tomczak, and Annette ten Teije. 2021. In arXiv:2104.00428 [Cs, Stat].
Mohamed, Shakir, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Journal of Machine Learning Research 21 (132): 1–62.
Oktay, Deniz, Nick McGreivy, Joshua Aduol, Alex Beatson, and Ryan P. Adams. 2020. arXiv:2007.10412 [Cs, Stat], July.
Ranganath, Rajesh, Sean Gerrish, and David M. Blei. 2013. arXiv:1401.0118 [Cs, Stat], December.
Schulman, John, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 3528–36. NIPS’15. Cambridge, MA, USA: MIT Press.
Stoker, Thomas M. 1986. Econometrica 54 (6): 1461–81.
Walder, Christian J., Paul Roussel, Richard Nock, Cheng Soon Ong, and Masashi Sugiyama. 2019. arXiv:1901.11311 [Cs, Stat], June.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.