Degrees of freedom in NNs
Information criteria at scale
2025-06-25 — 2025-08-19
Suspiciously similar content
In classical statistics there are families of model complexity estimates, which are loosely collectively referred to as “Degrees of freedom” of a model. Neither computationally not practically do they scale up to overparmaterized NNs, and there are other tools.
Exception: Shoham, Mor-Yosef, and Avron (2025) argues for a connection to the Takeuchi Information Criterion.
These end up being popular in developmental interpretability.
1 Learning coefficient
The major output of singular learning theory AFAICT is one particular estimate of model effective dimensionality.
2 Minimum description length
MDL seems to be an interesting way to think about NNs (Geoffrey E. Hinton and Zemel 1993; Geoffrey E. Hinton and van Camp 1993, 1993; Perez, Kiela, and Cho 2021).
3 SANE
Sharpness Adjusted Number of Effective parameters (L. Wang and Roberts 2023).
Spruiked by Stephen Roberts at Instability is All You Need: The Surprising Dynamics of Learning in Deep Models