Taking gradients through integrals/expectations using randomness, i.e. can I estimate this?
A concept with a similar name but which is not the same is Stochastic Gradient MCMC, which uses stochastic gradients to sample from a target posterior distribution. Some similar tools and concepts pop up in both applications.
Score function estimator
A.k.a. REINFORCE (all-caps, for some reason?) A generic method that works on lots of things, including discrete variables; notoriously high variance if done naïvely. Credited to (Williams 1992), but surely it must be older than that?
The use of this is that there is a simple and obvious Monte Carlo estimate of the latter, choosing sample
See score function estimators for more.
Reparameterization trick
Define some base distribution such that for some transform . Then
Less general but better-behaved than the score-function/REINFORCE estimator.
See reparameterization trick for more about that.
Gumbel-softmax
For categorical variates’ gradients in particular. See Gumbel-softmax.
Parametric
I can imagine that our observed rv is generated via lookups from its iCDF $F(;) $ with parameter : where . Each realization corresponds to a choice of independently. How can I get the derivative of such a map?
Maybe I generated my original variable not by the icdf method but by simulating some variable In which case I may as well have generated those by taking for some and I am conceptually generating my RV by fixing and taking So to find the effect of my perturbation what I actually need is
Does this do what we want? Kinda. So suppose that the parameters in question are something boring, such as the location parameter of a location-scale distribution, i.e. Then and thus
OK grand that came out simple enough.
TBC
Optimising Monte Carlo
Let us say I need to differentiate through a monte carlo algorithm to alter its parameters while holding the PRNG fixed. See Tuning MC.
References
Ahn, Korattikara, and Welling. 2012.
“Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In
Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.
Blundell, Cornebise, Kavukcuoglu, et al. 2015.
“Weight Uncertainty in Neural Networks.” In
Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. ICML’15.
Casella, and Robert. 1996.
“Rao-Blackwellisation of Sampling Schemes.” Biometrika.
Glasserman, and Ho. 1991. Gradient Estimation Via Perturbation Analysis.
Mnih, and Gregor. 2014.
“Neural Variational Inference and Learning in Belief Networks.” In
Proceedings of The 31st International Conference on Machine Learning. ICML’14.
Mohamed, Rosca, Figurnov, et al. 2020.
“Monte Carlo Gradient Estimation in Machine Learning.” Journal of Machine Learning Research.
Oktay, McGreivy, Aduol, et al. 2020.
“Randomized Automatic Differentiation.” arXiv:2007.10412 [Cs, Stat].
Ranganath, Gerrish, and Blei. 2013.
“Black Box Variational Inference.” arXiv:1401.0118 [Cs, Stat].
Rosca, Figurnov, Mohamed, et al. 2019.
“Measure–Valued Derivatives for Approximate Bayesian Inference.” In
NeurIPS Workshop on Approximate Bayesian Inference.
Rubinstein, Reuven Y., and Kroese. 2016. Simulation and the Monte Carlo Method. Wiley series in probability and statistics.
Schulman, Heess, Weber, et al. 2015.
“Gradient Estimation Using Stochastic Computation Graphs.” In
Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. NIPS’15.
Tucker, Mnih, Maddison, et al. 2017.
“REBAR: Low-Variance, Unbiased Gradient Estimates for Discrete Latent Variable Models.” In
Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
van Krieken, Tomczak, and Teije. 2021.
“Storchastic: A Framework for General Stochastic Automatic Differentiation.” In
arXiv:2104.00428 [Cs, Stat].
Walder, Roussel, Nock, et al. 2019.
“New Tricks for Estimating Gradients of Expectations.” arXiv:1901.11311 [Cs, Stat].
Xu, Quiroz, Kohn, et al. 2019.
“Variance Reduction Properties of the Reparameterization Trick.” In
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics.