Learning as compression

2020-08-05 — 2025-01-05

Suspiciously similar content

Is it useful to think about learning as compression? This is a common theme in machine learning and statistics. The idea is that learning is about finding a compact representation of the data. This is often done by minimising some measure of the complexity of the model, such as the number of parameters or the amount of information needed to describe the model. This idea is closely related to the concept of Occam’s razor, which states that simpler models are more likely to be correct than complex ones.

There are obvious philosophical and speculative connections to more general learning algorithms.

1 Case study: Psychometrics

TODO: write this up

Announcing the Ultimate Personality Test 2.0!

2 References

David, Moran, and Yehudayoff. 2016a. “Supervised Learning Through the Lens of Compression.” In Advances in Neural Information Processing Systems 29.

———. 2016b. “On Statistical Learning via the Lens of Compression.” arXiv:1610.03592 [Cs, Math].

Delétang, Ruoss, Duquenne, et al. 2024. “Language Modeling Is Compression.”

Hafez-Kolahi, Kasaei, and Soleymani-Baghshah. 2020. “Do Compressed Representations Generalize Better?”

Haghifam, Dziugaite, Moran, et al. 2021. “Towards a Unified Information-Theoretic Framework for Generalization.” In Advances in Neural Information Processing Systems.

Huang, Zhang, Shan, et al. 2024. “Compression Represents Intelligence Linearly.”

Lee, and Jo. 2021. “Information Flows of Diverse Autoencoders.”

Littlestone, and Warmuth. 1986. Relating Data Compression and Learnability.

Yin, Wu, Wang, et al. 2024. “Entropy Law: The Story Behind Data Compression and LLM Performance.”