# Categorical random variates

February 20, 2017 — January 12, 2022

classification
metrics
probability
regression
statistics

Distributions over categories.

## 1 Stick breaking tricks

Recommended reading: Machine Learning Trick of the Day (6): Tricks with Sticks— Shakir Mohammed.

TBC.

## 2 via random measures

See random measures.

See Pólya-Gamma.

TBC

## 6 Multicategorical distributions

Can something belong to many categories? Then we are probably looking for Paintbox models or some kind of multivariate Bernoulli model.

## 7 Dirichlet distribution

TBD. See Dirichlet distributions.

## 8 Dirichlet process

TBD. A distribution over an unknown number of categories. See also Gamma processes, which is how I learned to understand Dirichlet processes, insofar as I do.

## 9 Parametric distributions over non-negative integers

See count models.

## 10 Ordinal

If there is a natural ordering to the categories, then we are in a weird place. TBC.

## 11 Calibration

In the context of binary classification, calibration refers to the process of transforming the output scores from a binary classifier to class probabilities. If we think of the classifier as a “black box” that transforms input data into a score, we can think of calibration as a post-processing step that converts the score into a probability of the observation belonging to class 1.

The scores from some classifiers can already be interpreted as probabilities (e.g. logistic regression), while the scores from some classifiers require an additional calibration step before they can be interpreted as such (e.g. support vector machines).

He recommends the tutorial Huang et al. (2020) and associated github.

More general probabilistic calibration here.

TBD

## 13 References

Agresti. 2007. An Introduction to Categorical Data Analysis.
Arya, Schauer, Schäfer, et al. 2022. In.
Broderick, Pitman, and Jordan. 2013. Bayesian Analysis.
Connor, and Mosimann. 1969. “Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution.” Journal of the American Statistical Association.
Ferguson. 1974. The Annals of Statistics.
Frigyik, Kapila, and Gupta. 2010.
Grathwohl, Swersky, Hashemi, et al. 2021.
Gregor, Danihelka, Mnih, et al. 2014. In Proceedings of the 31st International Conference on Machine Learning.
Hjort. 1990. The Annals of Statistics.
Huang, Li, Macheret, et al. 2020. Journal of the American Medical Informatics Association : JAMIA.
Huijben, Kool, Paulus, et al. 2022. arXiv:2110.01515 [Cs, Stat].
Ishwaran, and Zarepour. 2002. Canadian Journal of Statistics.
Jang, Gu, and Poole. 2017.
Lau, and Cripps. 2022. Bernoulli.
Lin. 2016. “On The Dirichlet Distribution.”
Maddison, Mnih, and Teh. 2017. In.
Papandreou, and Yuille. 2011. In 2011 International Conference on Computer Vision.
Polson, Scott, and Windle. 2013. Journal of the American Statistical Association.
Rao, and Teh. 2009. “Spatial Normalized Gamma Processes.” In Proceedings of the 22nd International Conference on Neural Information Processing Systems. NIPS’09.
Roychowdhury, and Kulis. 2015. In Artificial Intelligence and Statistics.
Shah, Knowles, and Ghahramani. 2015. arXiv:1506.08180 [Cs, Stat].
Shekhovtsov. 2023. In Proceedings of the 40th International Conference on Machine Learning.
Teh, Grür, and Ghahramani. 2007. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics.
Thibaux, and Jordan. 2007. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics.
Wang, and Yin. 2020. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI).
Xuan, Lu, Zhang, et al. 2015. arXiv:1503.08542 [Cs, Stat].
Zhang, and Paisley. 2019. In International Conference on Machine Learning.