# Quasi-gradients of discrete parameters

December 20, 2022 — May 17, 2024

Notes on taking gradients through functions that look like they have no gradients because their arguments are discrete. TBC.

See also Polya-Gamma…

## 1 Stochastic gradients via REINFORCE

The classic generic REINFORCE/Score function method for estimating gradients of expectations of functions of random variables can be used to estimate gradients of functions of discrete random variables as a special case.. There are particular extra tricks used for discrete random variables; see e.g. (Grathwohl et al. 2018; Liu et al. 2019; Mnih and Gregor 2014; Tucker et al. 2017).

## 2 Gumbel-(soft)max

A.k.a. the concrete distribution. See Gumbel-max.

## 3 Avoiding the need for gradients

Famously, Expectation Maximization can handle some of the same optimisation problems as gradient-based methods, but without needing gradients. There are presumably more variants.

## 4 Gradients of other weird things

Differentiable sorting? See, e.g. Grover et al. (2018) and Prillo and Eisenschlos (2020).

## 5 Other methods

What even are (Grathwohl et al. 2021; Zhang, Liu, and Liu 2022)? I think they work for quantised continuous vars, or possibly ordinal vars?

## 6 Examples

## 7 Incoming

## 8 References

*Biometrika*.

*Proceedings of ICLR*.

*The Journal of Machine Learning Research*.

*Proceedings of The 31st International Conference on Machine Learning*. ICML’14.

*Journal of Machine Learning Research*.

*arXiv:2007.10412 [Cs, Stat]*.

*NeurIPS Workshop on Approximate Bayesian Inference*.

*Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2*. NIPS’15.

*Advances in Neural Information Processing Systems*.

*Proceedings of the 31st International Conference on Neural Information Processing Systems*. NIPS’17.

*arXiv:2104.00428 [Cs, Stat]*.

*Machine Learning*.

*Proceedings of the 36th International Conference on Machine Learning*.

*Proceedings of the 39th International Conference on Machine Learning*.