Learning as compression
August 5, 2020 — January 5, 2025
Bayes
compsci
graphical models
how do science
information
machine learning
meta learning
model selection
networks
probability
pseudorandomness
statistics
statmech
stringology
Is it useful to think about learning as compression? This is a common theme in machine learning and statistics. The idea is that learning is about finding a compact representation of the data. This is often done by minimising some measure of the complexity of the model, such as the number of parameters or the amount of information needed to describe the model. This idea is closely related to the concept of Occam’s razor, which states that simpler models are more likely to be correct than complex ones.
There are obvious philosophical and speculative connections to more general learning algorithms.
1 Case study: Psychometrics
TODO: write this up
Announcing the Ultimate Personality Test 2.0!
2 References
David, Moran, and Yehudayoff. 2016a. “Supervised Learning Through the Lens of Compression.” In Advances in Neural Information Processing Systems 29.
———. 2016b. “On Statistical Learning via the Lens of Compression.” arXiv:1610.03592 [Cs, Math].
Delétang, Ruoss, Duquenne, et al. 2024. “Language Modeling Is Compression.”
Hafez-Kolahi, Kasaei, and Soleymani-Baghshah. 2020. “Do Compressed Representations Generalize Better?”
Haghifam, Dziugaite, Moran, et al. 2021. “Towards a Unified Information-Theoretic Framework for Generalization.” In Advances in Neural Information Processing Systems.
Huang, Zhang, Shan, et al. 2024. “Compression Represents Intelligence Linearly.”
Lee, and Jo. 2021. “Information Flows of Diverse Autoencoders.”
Littlestone, and Warmuth. 1986. Relating Data Compression and Learnability.
Yin, Wu, Wang, et al. 2024. “Entropy Law: The Story Behind Data Compression and LLM Performance.”