Categorical random variates

Distributions over categories.

via random measures

See random measures.


See Gumbel-max tricks.

Pólya-Gamma augmentation

See Pólya-Gamma.

Softmax models


Multicategorical distributions

Can something belong to many categories? Then we are probably looking for Paintbox models (Broderick, Pitman, and Jordan 2013; Zhang and Paisley 2019) or some kind of multivariate Bernoulli model.

Dirichlet distribution

TBD. See Dirichlet distributions.

Dirichlet process

TBD. A distribution over an unknown number of categories. See also Gamma processes, which is how I learned to understand Dirichlet processes, insofar as I do.

Parametric distributions over non-negative integers

See count models.


If there is a natural ordering to the categories, then we are in a weird place. TBC.


Kenneth Tay says

In the context of binary classification, calibration refers to the process of transforming the output scores from a binary classifier to class probabilities. If we think of the classifier as a “black box” that transforms input data into a score, we can think of calibration as a post-processing step that converts the score into a probability of the observation belonging to class 1.

The scores from some classifiers can already be interpreted as probabilities (e.g. logistic regression), while the scores from some classifiers require an additional calibration step before they can be interpreted as such (e.g. support vector machines).

He recommends the tutorial Huang et al. (2020) and associated github.




Agresti, Alan. 2007. An Introduction to Categorical Data Analysis. Wiley.
Arya, Gaurav, Moritz Schauer, Frank Schäfer, and Christopher Vincent Rackauckas. 2022. Automatic Differentiation of Programs with Discrete Randomness.” In.
Broderick, Tamara, Jim Pitman, and Michael I. Jordan. 2013. Feature Allocations, Probability Functions, and Paintboxes.” Bayesian Analysis 8 (4): 801–36.
Connor, Robert J., and James E. Mosimann. 1969. “Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution.” Journal of the American Statistical Association 64 (325): 194–206.
Ferguson, Thomas S. 1974. Prior Distributions on Spaces of Probability Measures.” The Annals of Statistics 2 (4): 615–29.
Frigyik, Bela, Amol Kapila, and Maya R Gupta. 2010. Introduction to the Dirichlet Distribution and Related Processes.”
Grathwohl, Will, Kevin Swersky, Milad Hashemi, David Duvenaud, and Chris J. Maddison. 2021. Oops I Took A Gradient: Scalable Sampling for Discrete Distributions.” arXiv.
Hjort, Nils Lid. 1990. Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data.” The Annals of Statistics 18 (3): 1259–94.
Huang, Yingxiang, Wentao Li, Fima Macheret, Rodney A Gabriel, and Lucila Ohno-Machado. 2020. A Tutorial on Calibration Measurements and Calibration Models for Clinical Prediction Models.” Journal of the American Medical Informatics Association : JAMIA 27 (4): 621–33.
Huijben, Iris A. M., Wouter Kool, Max B. Paulus, and Ruud J. G. van Sloun. 2022. A Review of the Gumbel-Max Trick and Its Extensions for Discrete Stochasticity in Machine Learning.” arXiv:2110.01515 [Cs, Stat], March.
Ishwaran, Hemant, and Mahmoud Zarepour. 2002. Exact and Approximate Sum Representations for the Dirichlet Process.” Canadian Journal of Statistics 30 (2): 269–83.
Jang, Eric, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax.” arXiv:1611.01144 [Cs, Stat], August.
Lau, John W., and Edward Cripps. 2022. Thinned Completely Random Measures with Applications in Competing Risks Models.” Bernoulli 28 (1): 638–62.
Lin, Jiayu. 2016. “On The Dirichlet Distribution,” 75.
Maddison, Chris J., Andriy Mnih, and Yee Whye Teh. 2017. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,” March.
Papandreou, George, and Alan L. Yuille. 2011. Perturb-and-MAP Random Fields: Using Discrete Optimization to Learn and Sample from Energy Models.” In 2011 International Conference on Computer Vision, 193–200. Barcelona, Spain: IEEE.
Polson, Nicholas G., James G. Scott, and Jesse Windle. 2013. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables.” Journal of the American Statistical Association 108 (504): 1339–49.
Rao, Vinayak, and Yee Whye Teh. 2009. “Spatial Normalized Gamma Processes.” In Proceedings of the 22nd International Conference on Neural Information Processing Systems, 1554–62. NIPS’09. Red Hook, NY, USA: Curran Associates Inc.
Roychowdhury, Anirban, and Brian Kulis. 2015. Gamma Processes, Stick-Breaking, and Variational Inference.” In Artificial Intelligence and Statistics, 800–808. PMLR.
Shah, Amar, David A. Knowles, and Zoubin Ghahramani. 2015. An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process.” arXiv:1506.08180 [Cs, Stat], June.
Teh, Yee Whye, Dilan Grür, and Zoubin Ghahramani. 2007. Stick-Breaking Construction for the Indian Buffet Process.” In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 556–63. PMLR.
Thibaux, Romain, and Michael I. Jordan. 2007. Hierarchical Beta Processes and the Indian Buffet Process.” In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 564–71. PMLR.
Xuan, Junyu, Jie Lu, Guangquan Zhang, Richard Yi Da Xu, and Xiangfeng Luo. 2015. Nonparametric Relational Topic Models Through Dependent Gamma Processes.” arXiv:1503.08542 [Cs, Stat], March.
Zhang, Aonan, and John Paisley. 2019. Random Function Priors for Correlation Modeling.” In International Conference on Machine Learning, 7424–33.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.