The family of Gumbel tricks is useful for sampling from things that look like categorical distributions and simplices and learning models which use categorical variables by reparameterisation.

## Gumbel Trick basic

- Francis Bach on Gumbel tricks has his characteristically out-of-the-simplex perspective.
- Chris J. Maddison on Gumbel Machinery
- Laurent Dinh, Gumbel-Max Trick Inference
- The Gumbel-Max Trick for Discrete Distributions
- Tim Veira, Gumbel-max trick

## Softmax relaxation

A.k.a. relaxed Bernoulli, relaxed categorical.

One of the co-inventors, Eric Jang, wrote a tutorial Categorical Variational Autoencoders using Gumbel-Softmax:

The main contribution of this work is a “reparameterization trick” for the categorical distribution. Well, not quite—it’s actually a re-parameterization trick for a distribution that we can smoothly deform into the categorical distribution. We use the Gumbel-Max trick, which provides an efficient way to draw samples \(z\) from the Categorical distribution with class probabilities \(\pi_{i}\) : \[ z=\text { one_hot }\left(\underset{i}{\arg \max }\left[g_{i}+\log \pi_{i}\right]\right) \] argmax is not differentiable, so we simply use the softmax function as a continuous approximation of argmax: \[ y_{i}=\frac{\exp \left(\left(\log \left(\pi_{i}\right)+g_{i}\right) / \tau\right)}{\sum_{j=1}^{k} \exp \left(\left(\log \left(\pi_{j}\right)+g_{j}\right) / \tau\right)} \quad \text { for } i=1, \ldots, k \] Hence, we call this the “Gumbel-SoftMax distribution”. \(\tau\) is a temperature parameter that allows us to control how closely samples from the Gumbel-Softmax distribution approximate those from the categorical distribution. As \(\tau \rightarrow 0\), the softmax becomes an argmax and the Gumbel-Softmax distribution becomes the categorical distribution. During training, we let \(\tau>0\) to allow gradients past the sample, then gradually anneal the temperature \(\tau\) (but not completely to 0, as the gradients would blow up).

Emma Benjaminson, The Gumbel-Softmax Distribution takes it in small pedagogic steps.

## Straight-through Gumbel

TBC

## References

*arXiv:2110.01515 [Cs, Stat]*, March.

*arXiv:1611.01144 [Cs, Stat]*, August.

*2011 International Conference on Computer Vision*, 193–200. Barcelona, Spain: IEEE.

*Advances in Neural Information Processing Systems*, 33:12311–21. Curran Associates, Inc.

*Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)*, 500–509. PMLR.

## No comments yet. Why not leave one?