# Infinite width limits of neural networks

December 9, 2020 — May 11, 2021

Large-width limits of neural nets. An interesting way of considering overparameterization.

## 1 Neural Network Gaussian Process

For now: See Neural network Gaussian process on Wikipedia.

The field that sprang from the insight that in the infinite limit, random neural nets with Gaussian weights and appropriate scaling asymptotically approach certain special Gaussian processes, and there are useful conclusions we can draw from that.

More generally we might consider correlated and/or non-Gaussian weights, and deep networks. Unless otherwise stated though, I am thinking about i.i.d. Gaussian weights, and a single hidden layer.

In this single-hidden-layer case we get tractable covariance structure. See NN kernels.

## 2 Neural Network Tangent Kernel

NTK? See Neural Tangent Kernel.

## 3 Implicit regularization

Here’s one interesting perspective on wide nets which looks rather like the NTK model, but is it? To read.

• The effective capacity of neural networks is large enough for a brute-force memorization of the entire data set.

• Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels.

• Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

[…] Explicit regularization may improve generalization performance, but is neither necessary nor by itself sufficient for controlling generalization error. […] Appealing to linear models, we analyze how SGD acts as an implicit regularizer.

## 4 Dropout

Dropout is sometimes presumed to simulate from a certain kind of Gaussian process out of a neural net. See Dropout.

## 5 As stochastic DEs

We can find an SDE for a given NN-style kernel if we can find Green’s functions $$\sigma^2_\varepsilon \langle G_\cdot(\mathbf{x}_p), G_\cdot(\mathbf{x}_q)\rangle = \mathbb{E} \big[ \psi\big(Z_p\big) \psi\big(Z_q \big) \big].$$ Russell Tsuchida observes: if you set $$G_\mathbf{s}(\mathbf{x}_p) = \psi(\mathbf{s}^\top \mathbf{x}_p) \sqrt{\phi(\mathbf{s})}$$, where $$\phi$$ is the pdf of an independent standard multivariate normal vector is a solution.

## 6 References

Adlam, Lee, Xiao, et al. 2020. arXiv:2010.07355 [Cs, Stat].
Arora, Du, Hu, et al. 2019. “On Exact Computation with an Infinitely Wide Neural Net.” In Advances in Neural Information Processing Systems.
Bai, and Lee. 2020. arXiv:1910.01619 [Cs, Math, Stat].
Belkin, Ma, and Mandal. 2018. In International Conference on Machine Learning.
Chen, Minshuo, Bai, Lee, et al. 2021. arXiv:2006.13436 [Cs, Stat].
Chen, Lin, and Xu. 2020. arXiv:2009.10683 [Cs, Math, Stat].
Cho, and Saul. 2009. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. NIPS’09.
Domingos. 2020. arXiv:2012.00152 [Cs, Stat].
Dutordoir, Durrande, and Hensman. 2020. In Proceedings of the 37th International Conference on Machine Learning. ICML’20.
Dutordoir, Hensman, van der Wilk, et al. 2021. In arXiv:2105.04504 [Cs, Stat].
Fan, and Wang. 2020. In Advances in Neural Information Processing Systems.
Fort, Dziugaite, Paul, et al. 2020. In Advances in Neural Information Processing Systems.
Gal, and Ghahramani. 2015. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
———. 2016. In arXiv:1512.05287 [Stat].
Geifman, Yadav, Kasten, et al. 2020. In arXiv:2007.01580 [Cs, Stat].
Ghahramani. 2013. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Girosi, Jones, and Poggio. 1995. Neural Computation.
Giryes, Sapiro, and Bronstein. 2016. IEEE Transactions on Signal Processing.
He, Lakshminarayanan, and Teh. 2020. In Advances in Neural Information Processing Systems.
Jacot, Gabriel, and Hongler. 2018. In Advances in Neural Information Processing Systems. NIPS’18.
Karakida, and Osawa. 2020. Advances in Neural Information Processing Systems.
Lázaro-Gredilla, and Figueiras-Vidal. 2009. In Advances in Neural Information Processing Systems.
Lee, Bahri, Novak, et al. 2018. In ICLR.
Lee, Xiao, Schoenholz, et al. 2019. In Advances in Neural Information Processing Systems.
Matthews, Rowland, Hron, et al. 2018. In arXiv:1804.11271 [Cs, Stat].
Meronen, Irwanto, and Solin. 2020. In Advances in Neural Information Processing Systems.
Neal. 1996a.
———. 1996b. In Bayesian Learning for Neural Networks. Lecture Notes in Statistics.
Novak, Xiao, Hron, et al. 2019. arXiv:1912.02803 [Cs, Stat].
Novak, Xiao, Lee, et al. 2020. In The International Conference on Learning Representations.
Pearce, Tsuchida, Zaki, et al. 2019. “Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions.” In Uncertainty in Artificial Intelligence.
Rossi, Heinonen, Bonilla, et al. 2021. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.
Sachdeva, Dhaliwal, Wu, et al. 2022.
Shi, Titsias, and Mnih. 2020. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics.
Tiao, Dutordoir, and Picheny. 2023. In.
Williams. 1996. In Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS’96.
Yang. 2019. arXiv:1910.12478 [Cond-Mat, Physics:math-Ph].
Yang, and Hu. 2020. arXiv:2011.14522 [Cond-Mat].
Zhang, Bengio, Hardt, et al. 2017. In Proceedings of ICLR.