## Especially stochastic automatic differentiation

Taking gradients through integrals using randomness. A thing with similar name but which is not the same is Stochastic Gradient MCMC which uses stochastic gradients to sample from a target posterior distribution. Probably some similar tools pop up in both uses, however.

## Score function estimator

A.k.a. REINFORCE, all-caps, for some reason. Could do with a decent intro. TBD.

A very generic method that works on lots of things, including discrete variables; however notoriously high variance if done naïvely.

For unifying overviews see and the storchastic docs.

### Rao-Blackwellization

I like this idea but I have a vague feeling that I saw somethign similar in Reuven Y. Rubinstein and Kroese (2016). Todo: follow up.

## Parametric

I can imagine that our observed rv $${\mathsf{x}}\in \mathbb{R}$$ is generated via lookups from its iCDF $$F(\cdot;\theta)$$ with parameter $$\theta$$: $\mathsf{x} = F^{-1}(\mathsf{u};\theta)$ where $$\mathsf{u}\sim\operatorname{Uniform}(0,1)$$. Each realization corresponds to a choice of $$u_i\sim \mathsf{u}$$ independently. How can I get the derivative of such a map?

Maybe I generated my original variable not by the icdf method but by simulating some variable $${\mathsf{z}}\sim F(\cdot; \theta).$$ In which case I may as well have generated those $$\mathsf{u}_i$$ by taking $$\mathsf{u}_i=F(\mathsf{z}_i;\theta)$$ for some $$\mathsf{z} \sim F(\cdot;\theta)$$ and I am conceptually generating my RV by fixing $$z_i\sim\mathsf{z}_i$$ and taking $$\phi := F^{-1}(F(z_i;\theta);\tau).$$ So to find the effect of my perturbation what I actually need is

\begin{aligned} \left.\frac{\partial}{\partial \tau} F^{-1}(F(z;\theta);\tau)\right|_{\tau=\theta}\\ \end{aligned}

Does this do what we want? Kinda. So suppose that the parameters in question are something boring, such as the location parameter of a location-scale distribution, i.e. $$F(\cdot;\theta)=F(\cdot-\theta;0).$$ Then $$F^{-1}(\cdot;\theta)=F^{-1}(\cdot;0)+\theta$$ and thus

\begin{aligned} \left.\frac{\partial}{\partial \tau} F^{-1}(F(z;\theta);\tau)\right|_{\tau=\theta} &=\left.\frac{\partial}{\partial \tau} F^{-1}(F(z-\theta;0);0)+\tau\right|_{\tau=\theta}\\ &=\left.\frac{\partial}{\partial \tau}\left(z-\theta+\tau\right)\right|_{\tau=\theta}\\ &=1\\ \end{aligned}

OK grand that came out simple enough.

TBC

## Tooling

van Krieken, Tomczak, and Teije (2021) claims to supply us with a large library of pytorch tools for storchastic gradient estimation purposes, under the rubric Storchastic. (Source.). See also Deepmind’s mc_gradients.

## Optimising Monte Carlo

Let us say I need to differentiate through a monte carlo algorithm to alter its parameters while holding the PRNG fixed. See Tuning MC.

## References

Ahn, Sungjin, Anoop Korattikara, and Max Welling. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 1771–78. ICML’12. Madison, WI, USA: Omnipress.
Arya, Gaurav, Moritz Schauer, Frank Schäfer, and Christopher Vincent Rackauckas. 2022. In.
Fu, Michael. 2005. 32.
Hyvärinen, Aapo. 2005. The Journal of Machine Learning Research 6 (December): 695–709.
Krieken, Emile van, Jakub M. Tomczak, and Annette ten Teije. 2021. In arXiv:2104.00428 [Cs, Stat].
Liu, Runjing, Jeffrey Regier, Nilesh Tripuraneni, Michael I. Jordan, and Jon McAuliffe. 2019. arXiv.
Mohamed, Shakir, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Journal of Machine Learning Research 21 (132): 1–62.
Oktay, Deniz, Nick McGreivy, Joshua Aduol, Alex Beatson, and Ryan P. Adams. 2020. arXiv:2007.10412 [Cs, Stat], July.
Ranganath, Rajesh, Sean Gerrish, and David M. Blei. 2013. arXiv:1401.0118 [Cs, Stat], December.
Richter, Lorenz, Ayman Boustati, Nikolas Nüsken, Francisco J. R. Ruiz, and Ömer Deniz Akyildiz. 2020. arXiv.
Rosca, Mihaela, Michael Figurnov, Shakir Mohamed, and Andriy Mnih. 2019. “Measure–Valued Derivatives for Approximate Bayesian Inference.” In NeurIPS Workshop on Approximate Bayesian Inference.
Rubinstein, Reuven Y., and Dirk P. Kroese. 2016. Simulation and the Monte Carlo Method. 3 edition. Wiley series in probability and statistics. Hoboken, New Jersey: Wiley.
Rubinstein, Reuven Y, and Dirk P Kroese. 2004. The Cross-Entropy Method a Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. New York, NY: Springer New York.
Schulman, John, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 3528–36. NIPS’15. Cambridge, MA, USA: MIT Press.
Shi, Jiaxin, Shengyang Sun, and Jun Zhu. 2018. In. arXiv.
Stoker, Thomas M. 1986. Econometrica 54 (6): 1461–81.
Walder, Christian J., Paul Roussel, Richard Nock, Cheng Soon Ong, and Masashi Sugiyama. 2019. arXiv:1901.11311 [Cs, Stat], June.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.