Neural tangent kernel

See also: infinite width networks.

Good starting points: Lilian Weng, Some Math behind Neural Tangent Kernel. Ferenc Huszár provides some Intuition on the Neural Tangent Kernel, i.e. the paper (Lee et al. 2019).

It turns out the neural tangent kernel becomes particularly useful when studying learning dynamics in infinitely wide feed-forward neural networks. Why? Because in this limit, two things happen:

  1. First: if we initialize \(θ_0\) randomly from appropriately chosen distributions, the initial NTK of the network \(k_{θ_0}\) approaches a deterministic kernel as the width increases. This means, that at initialization, \(k_{θ_0}\) doesn’t really depend on \(k_{θ_0}\) but is a fixed kernel independent of the specific initialization.-
  2. Second: in the infinite limit the kernel \(k_{θ_t}\) stays constant over time as we optimise \(\theta_t\). This removes the parameter dependence during training.

These two facts put together imply that gradient descent in the infinitely wide and infinitesimally small learning rate limit can be understood as a pretty simple algorithm called kernel gradient descent with a fixed kernel function that depends only on the architecture (number of layers, activations, etc).

These results, taken together with an older known result (Neal 1996), allows us to characterise the probability distribution of minima that gradient descent converges to in this infinite limit as a Gaussian process.


Bai, Yu, and Jason D. Lee. 2020. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks.” arXiv:1910.01619 [Cs, Math, Stat], February.
Chen, Lin, and Sheng Xu. 2020. Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS.” arXiv:2009.10683 [Cs, Math, Stat], October.
Chen, Minshuo, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, and Richard Socher. 2021. Towards Understanding Hierarchical Learning: Benefits of Neural Representations.” arXiv:2006.13436 [Cs, Stat], March.
Fan, Zhou, and Zhichao Wang. 2020. Spectra of the Conjugate Kernel and Neural Tangent Kernel for Linear-Width Neural Networks.” In Advances in Neural Information Processing Systems, 33:12.
Fort, Stanislav, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, and Surya Ganguli. 2020. Deep Learning Versus Kernel Learning: An Empirical Study of Loss Landscape Geometry and the Time Evolution of the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems. Vol. 33.
Geifman, Amnon, Abhay Yadav, Yoni Kasten, Meirav Galun, David Jacobs, and Ronen Basri. 2020. On the Similarity Between the Laplace and Neural Tangent Kernels.” In arXiv:2007.01580 [Cs, Stat].
He, Bobby, Balaji Lakshminarayanan, and Yee Whye Teh. 2020. Bayesian Deep Ensembles via the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems. Vol. 33.
Jacot, Arthur, Franck Gabriel, and Clement Hongler. 2018. Neural Tangent Kernel: Convergence and Generalization in Neural Networks.” In Advances in Neural Information Processing Systems, 31:8571–80. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Lee, Jaehoon, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. 2019. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.” In Advances in Neural Information Processing Systems, 8570–81.
Neal, Radford M. 1996. Priors for Infinite Networks.” In Bayesian Learning for Neural Networks, edited by Radford M. Neal, 29–53. Lecture Notes in Statistics. New York, NY: Springer.
Novak, Roman, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, and Samuel S. Schoenholz. 2019. Neural Tangents: Fast and Easy Infinite Neural Networks in Python.” arXiv:1912.02803 [Cs, Stat], December.
Sachdeva, Noveen, Mehak Preet Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022. Infinite Recommendation Networks: A Data-Centric Approach.” arXiv.
Simon, James B., Sajant Anand, and Michael R. DeWeese. 2022. Reverse Engineering the Neural Tangent Kernel.” arXiv.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.