Bayes neural nets via subsetting weights
January 11, 2022 — December 17, 2024
Bayes NNs where only some weights are random and others are fixed. This raises various difficulties — how do you update a fixed parameter? It sounds like a sparse Bayes problem, but whereas in sparse Bayes we wish to audition interpretable regressors for inclusion in the model, here we wish to audition uninterpretable, unidentifiable weights for inclusion in the model as random variables, but ultimately all weights are included either as random variates or deterministic parameters.
Moving target alert! No-one agrees what to call them. For now, I use the emerging pBNNs, aka “partial Bayesian neural networks” (Zhao et al. 2024) which seems like an acceptable term.
1 Is this even principled?
At first glance, this sounds like an OK thing to do. But then you try to write down the equations and stuff looks weird. How would we interpret the “posterior” of a fixed parameter? Surely there is some kind of variational argument?
Try Sharma et al. (2022) for a start.
2 How to update a deterministic parameter?
From the perspective of Bayes inference, parameters we do not update have zero prior variance. And yet we do update them by SGD. What does that mean? How can we make that statistically well-posed?
3 Last layer
The most famous one. Not that interesting, since it misses many phenomena of interest. However, so tractable that it is a good place to start. See Bayes last layer.
4 Via sequential Monte Carlo?
Zhao et al. (2024) is an elegant paper which shows how to train a pBNN using sequential Monte Carlo.
5 Via singular learning theory?
The connections are for sure suggestive. The interpretation of SLT, as far as my meagre understanding goes, would be slightly different. We would not be learning a model with some parameters fixed, necessarily, but we might find that some parameters are locally unidentifiable, which sounds like it is potentially the converse. But the setting is so similar that it bears investigating. See singular learning theory.
6 Probabilistic weight tying
Possibly also in effect a form of pBNN. Rafael Oliveira has referred me to Roth and Pernkopf (2020) for some ideas on that theme.