Figure 1: Busts of Watanabe, Murfet and Hoogland regarding the oracle of Singular Learning Theory, concealed in a mist of Monte Carlo samples.

Placeholder.

As far as I can tell, a first-order approximation to (the bits I vaguely understand of) Singular Learning Theory is something like:

Classical Bayesian statistics has a good theory of well-posed models with a small number of interpretable parameters. Singular Learning Theory is a theory of ill-posed models with a large number of uninterpretable parameters, which provides us with a model of Bayesian statistics by using results from algebraic geometry about singularities in the loss surface.

Why might we care about this? For the moment I am taking it on faith.

Jesse Hoogland, Neural networks generalize because of this one weird trick:

Statistical learning theory is lying to you: “overparametrized” models actually aren’t overparametrized, and generalisation is not just a question of broad basins.

1 Local Learning Coefficient

Recommended to me by Rohan Hitchcock:

I probably have some alpha in estimating this value.

2 Use in developmental interpretability

I do not understand this step of the argument, but see Developmental Interpretability by Jesse Hoogland and Stan van Wingerden, and maybe read Lehalleur et al. ().

3 Fractal loss landscapes

Notionally there is a connection to fractal loss landscapes? See Fractal dimension of loss landscapes and self similar behaviour in neural networks

4 Incoming

5 References

Andreeva, Dupuis, Sarkar, et al. 2024. Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms.”
Bouchaud, and Georges. 1990. Anomalous Diffusion in Disordered Media: Statistical Mechanisms, Models and Physical Applications.” Physics Reports.
Carroll. 2021. “Phase Transitions in Neural Networks.”
Chen, Lau, Mendel, et al. 2023. Dynamical Versus Bayesian Phase Transitions in a Toy Model of Superposition.”
Farrugia-Roberts, Murfet, and Geard. 2022. “Structural Degeneracy in Neural Networks.”
Furman. 2025. LLC as Fractal Dimension.”
Hitchcock, and Hoogland. 2025. “From Global to Local: A Scalable Benchmark for Local Posterior Sampling.”
Lau, Furman, Wang, et al. 2024. The Local Learning Coefficient: A Singularity-Aware Complexity Measure.”
Lehalleur, Hoogland, Farrugia-Roberts, et al. 2025. You Are What You Eat — AI Alignment Requires Understanding How Data Shapes Structure and Generalisation.”
Lin. 2011. Algebraic Methods for Evaluating Integrals in Bayesian Statistics.”
Ly, and Gong. 2025. Optimization on Multifractal Loss Landscapes Explains a Diverse Range of Geometrical and Dynamical Properties of Deep Learning.” Nature Communications.
Volkhardt, and Grubmüller. 2022. Estimating Ruggedness of Free-Energy Landscapes of Small Globular Proteins from Principal Component Analysis of Molecular Dynamics Trajectories.” Physical Review E.
Watanabe. 2009. Algebraic Geometry and Statistical Learning Theory. Cambridge Monographs on Applied and Computational Mathematics.
———. 2020. Mathematical theory of Bayesian statistics.
———. 2022. Recent Advances in Algebraic Geometry and Bayesian Statistics.”
Wei, and Lau. 2023. Variational Bayesian Neural Networks via Resolution of Singularities.” Journal of Computational and Graphical Statistics.
Wei, Murfet, Gong, et al. 2023. Deep Learning Is Singular, and That’s Good.” IEEE Transactions on Neural Networks and Learning Systems.