Figure 1

Placeholder.

As far as I can tell, a first-order approximation to (the bits I vaguely understand of) Singular Learning Theory is something like:

Classical Bayesian statistics has a good theory of well-posed models with a small number of interpretable parameters. Singular Learning Theory is a theory of ill-posed models with a large number of uninterpretable parameters, which provides us with a model of Bayesian statistics by using results from algebraic geometry about singularities in the loss surface.

There are obviously a lot of details missing from that. I think there are non-Bayesian versions too, but I haven’t been exposed to them yet.

Jesse Hoogland, Neural networks generalize because of this one weird trick:

Statistical learning theory is lying to you: “overparametrized” models actually aren’t overparametrized, and generalization is not just a question of broad basins.

1 Local Learning Coefficient

Recommended to me by Rohan Hitchcock:

2 Incoming

3 References

Carroll. 2021. “Phase Transitions in Neural Networks.”
Farrugia-Roberts, Murfet, and Geard. 2022. “Structural Degeneracy in Neural Networks.”
Lau, Furman, Wang, et al. 2024. The Local Learning Coefficient: A Singularity-Aware Complexity Measure.”
Lin. 2011. Algebraic Methods for Evaluating Integrals in Bayesian Statistics.”
Watanabe. 2009. Algebraic Geometry and Statistical Learning Theory. Cambridge Monographs on Applied and Computational Mathematics.
———. 2020. Mathematical theory of Bayesian statistics.
———. 2022. Recent Advances in Algebraic Geometry and Bayesian Statistics.”
Wei, and Lau. 2023. Variational Bayesian Neural Networks via Resolution of Singularities.” Journal of Computational and Graphical Statistics.
Wei, Murfet, Gong, et al. 2023. Deep Learning Is Singular, and That’s Good.” IEEE Transactions on Neural Networks and Learning Systems.