# Gumbel (soft) max tricks

## Concrete distribution, relaxed categorical etc

The family of Gumbel tricks is useful for sampling from things that look like categorical distributions and simplices and learning models which use categorical variables by reparameterisation.

## Softmax relaxation

A.k.a. relaxed Bernoulli, relaxed categorical.

One of the co-inventors, Eric Jang, wrote a tutorial Categorical Variational Autoencoders using Gumbel-Softmax:

The main contribution of this work is a “reparameterization trick” for the categorical distribution. Well, not quite—it’s actually a re-parameterization trick for a distribution that we can smoothly deform into the categorical distribution. We use the Gumbel-Max trick, which provides an efficient way to draw samples $$z$$ from the Categorical distribution with class probabilities $$\pi_{i}$$ : $z=\text { one_hot }\left(\underset{i}{\arg \max }\left[g_{i}+\log \pi_{i}\right]\right)$ argmax is not differentiable, so we simply use the softmax function as a continuous approximation of argmax: $y_{i}=\frac{\exp \left(\left(\log \left(\pi_{i}\right)+g_{i}\right) / \tau\right)}{\sum_{j=1}^{k} \exp \left(\left(\log \left(\pi_{j}\right)+g_{j}\right) / \tau\right)} \quad \text { for } i=1, \ldots, k$ Hence, we call this the “Gumbel-SoftMax distribution”. $$\tau$$ is a temperature parameter that allows us to control how closely samples from the Gumbel-Softmax distribution approximate those from the categorical distribution. As $$\tau \rightarrow 0$$, the softmax becomes an argmax and the Gumbel-Softmax distribution becomes the categorical distribution. During training, we let $$\tau>0$$ to allow gradients past the sample, then gradually anneal the temperature $$\tau$$ (but not completely to 0, as the gradients would blow up).

Emma Benjaminson, The Gumbel-Softmax Distribution takes it in small pedagogic steps.

TBC

## References

Huijben, Iris A. M., Wouter Kool, Max B. Paulus, and Ruud J. G. van Sloun. 2022. arXiv:2110.01515 [Cs, Stat], March.
Jang, Eric, Shixiang Gu, and Ben Poole. 2017. arXiv:1611.01144 [Cs, Stat], August.
Maddison, Chris J., Andriy Mnih, and Yee Whye Teh. 2017. March.
Papandreou, George, and Alan L. Yuille. 2011. In 2011 International Conference on Computer Vision, 193–200. Barcelona, Spain: IEEE.
Wang, Xi, and Junming Yin. 2020. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), 500–509. PMLR.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.