Singular Learning Theory
2024-10-29 — 2025-09-01
Suspiciously similar content
Placeholder.
As far as I can tell, a first-order approximation to (the bits I vaguely understand of) Singular Learning Theory is something like:
Classical Bayesian statistics has a good theory of well-posed models with a small number of interpretable parameters. Singular Learning Theory is a theory of ill-posed models with a large number of uninterpretable parameters, which provides us with a model of Bayesian statistics by using results from algebraic geometry about singularities in the loss surface.
Why might we care about this? For the moment I am taking it on faith. Since attending ILIAD2 I am relatively more optimistic about the potential for this program of research to go somewhere.
Jesse Hoogland, Neural networks generalize because of this one weird trick:
Statistical learning theory is lying to you: “overparametrized” models actually aren’t overparametrized, and generalisation is not just a question of broad basins.
1 Local Learning Coefficient
Resources recommended to me by Rohan Hitchcock:
- The upshot is, Jesse Hoogland and Stan van Wingerden argue that we should care about model complexity, and the local learning coefficient (Lau et al. 2024) is arguably the correct measure of model complexity.
- The RLCT Measures the Effective Dimension of Neural Networks
I probably have some alpha in estimating this value.
2 Use in developmental interpretability
I do not understand this step of the argument, but see Developmental Interpretability by Jesse Hoogland and Stan van Wingerden, and maybe read Lehalleur et al. (2025).
3 Fractal loss landscapes
Notionally there is a connection to fractal loss landscapes? See Fractal dimension of loss landscapes and self similar behaviour in neural networks.
4 Questions I would like to know how to answer
- Can we use LLC as an optimal design objective, crafting desired optima directly in loss space by altering loss functions and/or architectures dynamically?
- Can we do filtering to estimate LLC online? Looks a lot like sparse Kalman filtering, just sayin’.
- The Bayesian formalism. How does the notional “Bayesian” update work in pracical neural networks which are not trained by posterior updates? Does it matter? What happens when our training process really pushes the analogy, e.g. when we have synthetic data generation in a neural distillation?
- How biased is LLC estimation for nontrivial networks? I ran some sims and made the LLS estimates of SGHMC and SGLD diverge substantially. Is that worrisome?
- How would belief propagation work in a Bayes setting? What even is the natural space of priors generating local landscape geometries?
5 Incoming
The Developmental Interpretability site is mostly about SLT, and they run a thriving Discord server on the topic.
Alexander Gietelink Oldenziel, Singular Learning Theory
metauni’s Singular Learning Theory seminar
-
Timaeus is an AI safety research organisation working on applications of Singular Learning Theory (SLT) to alignment.