Degrees of freedom in NNs
Information criteria at scale
2025-06-25 — 2025-08-19
Wherein the notion of neural network degrees of freedom is examined through singular learning theory’s learning coefficient and via minimum description length, and a sharpness‑adjusted effective parameter count (SANE) is reported.
In classical statistics there are families of model complexity estimates, which are loosely collectively referred to as “Degrees of freedom” of a model. They don’t scale up to overparameterized NNs, computationally or practically, and there are other tools.
Exception: Shoham, Mor-Yosef, and Avron (2025) argues for a connection to the Takeuchi Information Criterion.
These end up being popular in developmental interpretability.
1 Learning coefficient
The major output of singular learning theory, AFAICT, is a particular estimate of a model’s effective dimensionality.
2 Minimum description length
MDL seems to be an interesting way to think about NNs (Geoffrey E. Hinton and Zemel 1993; Geoffrey E. Hinton and van Camp 1993; Perez, Kiela, and Cho 2021).
There’s some connection to LLC, I’ve been told, but I don’t yet know enough to make that precise.
3 SANE
Sharpness-Adjusted Number of Effective parameters (L. Wang and Roberts 2023).
Spruiked by Stephen Roberts at Instability is All You Need: The Surprising Dynamics of Learning in Deep Models
4 Incoming
- Eleuther’s local volume measurement looks connected: Research Update: Applications of Local Volume Measurement | EleutherAI Blog