Singular Learning Theory
2024-10-29 — 2025-05-30
Placeholder.
As far as I can tell, a first-order approximation to (the bits I vaguely understand of) Singular Learning Theory is something like:
Classical Bayesian statistics has a good theory of well-posed models with a small number of interpretable parameters. Singular Learning Theory is a theory of ill-posed models with a large number of uninterpretable parameters, which provides us with a model of Bayesian statistics by using results from algebraic geometry about singularities in the loss surface.
Why might we care about this? For the moment I am taking it on faith.
Jesse Hoogland, Neural networks generalize because of this one weird trick:
Statistical learning theory is lying to you: “overparametrized” models actually aren’t overparametrized, and generalisation is not just a question of broad basins.
1 Local Learning Coefficient
Recommended to me by Rohan Hitchcock:
- The upshot is, Jesse Hoogland and Stan van Wingerden argue that we should care about model complexity, and the Lau et al. (2024) local learning coefficient is arguably the correct measure of model complexity.
- The RLCT Measures the Effective Dimension of Neural Networks
I probably have some alpha in estimating this value.
2 Use in developmental interpretability
I do not understand this step of the argument, but see Developmental Interpretability by Jesse Hoogland and Stan van Wingerden, and maybe read Lehalleur et al. (2025).
3 Fractal loss landscapes
Notionally there is a connection to fractal loss landscapes? See Fractal dimension of loss landscapes and self similar behaviour in neural networks
4 Incoming
The Developmental Interpretability site is mostly about SLT, and they run a thriving Discord server on the topic.
Alexander Gietelink Oldenziel, Singular Learning Theory
metauni’s Singular Learning Theory seminar
-
Timaeus is an AI safety research organisation working on applications of Singular Learning Theory (SLT) to alignment.