# Minimum description length

August 6, 2020 — August 6, 2020

Bayes

information

model selection

statistics

stringology

A formalisation of Occam’s razor of some kind. I see it invoked in model selection.

In the bayes context I think this typically means model selection by optimising marginal likelihood

## 1 References

Arora, and Zhang. 2021. “Rip van Winkle’s Razor: A Simple Estimate of Overfit to Test Data.”

*arXiv:2102.13189 [Cs, Stat]*.
Barron, Andrew R. 1991. “Complexity Regularization with Application to Artificial Neural Networks.” In

*Nonparametric Functional Estimation and Related Topics*. NATO ASI Series 335.
Barron, A. R., and Cover. 1991. “Minimum Complexity Density Estimation.”

*IEEE Transactions on Information Theory*.
Barron, A., Rissanen, and Yu. 1998. “The Minimum Description Length Principle in Coding and Modeling.”

*IEEE Transactions on Information Theory*.
Collins, Dasgupta, and Schapire. 2001. “A Generalization of Principal Components Analysis to the Exponential Family.” In

*Advances in Neural Information Processing Systems*.
Grünwald, Peter. 1996. “A Minimum Description Length Approach to Grammar Inference.” In

*Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing*. Lecture Notes in Computer Science.
Grünwald, Peter D. 2004. “A Tutorial Introduction to the Minimum Description Length Principle.”

*Advances in Minimum Description Length: Theory and Applications*.
———. 2007.

*The Minimum Description Length Principle*.
Hansen, and Yu. 2001. “Model Selection and the Principle of Minimum Description Length.”

*Journal of the American Statistical Association*.
Jun Li, and Dacheng Tao. 2013. “Simple Exponential Family PCA.”

*IEEE Transactions on Neural Networks and Learning Systems*.
Legg. 2006. “Is There an Elegant Universal Theory of Prediction?” In

*Algorithmic Learning Theory*. Lecture Notes in Computer Science 4264.
Mavromatis. 2009. “Minimum Description Length Modelling of Musical Structure.”

*Journal of Mathematics and Music*.
Mohamed, Ghahramani, and Heller. 2008. “Bayesian Exponential Family PCA.” In

*Advances in Neural Information Processing Systems*.
Rissanen. 1984. “Universal Coding, Information, Prediction, and Estimation.”

*IEEE Transactions on Information Theory*.
Roy, Gordon, and Thrun. 2005. “Finding Approximate POMDP Solutions Through Belief Compression.”

*Journal of Artificial Intelligence Research*.
Solomonoff. 1964a. “A Formal Theory of Inductive Inference. Part I.”

*Information and Control*.
———. 1964b. “A Formal Theory of Inductive Inference. Part II.”

*Information and Control*.
Sterkenburg. 2016. “Solomonoff Prediction and Occam’s Razor.”

*Philosophy of Science*.
Ullrich. 2020. “A Coding Perspective on Deep Latent Variable Models.”

Vitányi. 2006. “Meaningful Information.”

*IEEE Transactions on Information Theory*.
Wojtowicz, and DeDeo. 2020. “From Probability to Consilience: How Explanatory Values Implement Bayesian Reasoning.”

*Trends in Cognitive Sciences*.