Neural tangent kernel

2020-12-09 — 2022-10-14

Suspiciously similar content

Good starting points: Lilian Weng, Some Math behind Neural Tangent Kernel. Ferenc Huszár provides some Intuition on the Neural Tangent Kernel, i.e. the paper (Lee et al. 2019).

It turns out the neural tangent kernel becomes particularly useful when studying learning dynamics in infinitely wide feed-forward neural networks. Why? Because in this limit, two things happen:

First: if we initialize $θ_{0}$ randomly from appropriately chosen distributions, the initial NTK of the network $k_{θ_{0}}$ approaches a deterministic kernel as the width increases. This means, that at initialization, $k_{θ_{0}}$ doesn’t really depend on $k_{θ_{0}}$ but is a fixed kernel independent of the specific initialization.

Second: in the infinite limit the kernel $k_{θ_{t}}$ stays constant over time as we optimise $θ_{t}$ . This removes the parameter dependence during training.

These two facts put together imply that gradient descent in the infinitely wide and infinitesimally small learning rate limit can be understood as a pretty simple algorithm called kernel gradient descent with a fixed kernel function that depends only on the architecture (number of layers, activations, etc).

These results, taken together with an older known result (Neal 1996), allow us to characterise the probability distribution of minima that gradient descent converges to in this infinite limit as a Gaussian process.

google/neural-tangents: Fast and Easy Infinite Neural Networks in Python (paper), (poster), (blog post)
When are Neural Networks more powerful than Neural Tangent Kernels? introduces two recent interesting takes, Bai and Lee (2020);M. Chen et al. (2021) which consider quadratic approximations instead of merely linearization. Sequel to Ultra-Wide Deep Nets and Neural Tangent Kernel.
Reverse engineering the NTK: towards first-principles architecture design

1 References

Bai, and Lee. 2020. “Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks.” arXiv:1910.01619 [Cs, Math, Stat].

Cagnetta, Oliveira, Sabanayagam, et al. 2023. “Kernels, Data & Physics.”

Chen, Minshuo, Bai, Lee, et al. 2021. “Towards Understanding Hierarchical Learning: Benefits of Neural Representations.” arXiv:2006.13436 [Cs, Stat].

Chen, Lin, and Xu. 2020. “Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS.” arXiv:2009.10683 [Cs, Math, Stat].

Fan, and Wang. 2020. “Spectra of the Conjugate Kernel and Neural Tangent Kernel for Linear-Width Neural Networks.” In Advances in Neural Information Processing Systems.

Fort, Dziugaite, Paul, et al. 2020. “Deep Learning Versus Kernel Learning: An Empirical Study of Loss Landscape Geometry and the Time Evolution of the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems.

Geifman, Yadav, Kasten, et al. 2020. “On the Similarity Between the Laplace and Neural Tangent Kernels.” In arXiv:2007.01580 [Cs, Stat].

Gosch, Sabanayagam, Ghoshdastidar, et al. 2024. “Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks.”

He, Lakshminarayanan, and Teh. 2020. “Bayesian Deep Ensembles via the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems.

Jacot, Gabriel, and Hongler. 2018. “Neural Tangent Kernel: Convergence and Generalization in Neural Networks.” In Advances in Neural Information Processing Systems. NIPS’18.

Lee, Xiao, Schoenholz, et al. 2019. “Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.” In Advances in Neural Information Processing Systems.

Liu, Zhu, and Belkin. 2020. “On the Linearity of Large Non-Linear Models: When and Why the Tangent Kernel Is Constant.” In Advances in Neural Information Processing Systems.

Neal. 1996. “Priors for Infinite Networks.” In Bayesian Learning for Neural Networks. Lecture Notes in Statistics.

Novak, Xiao, Hron, et al. 2019. “Neural Tangents: Fast and Easy Infinite Neural Networks in Python.” arXiv:1912.02803 [Cs, Stat].

Sabanayagam, Gosch, Günnemann, et al. 2024. “Exact Certification of (Graph) Neural Networks Against Label Poisoning.”

Sachdeva, Dhaliwal, Wu, et al. 2022. “Infinite Recommendation Networks: A Data-Centric Approach.”

Simon, Anand, and DeWeese. 2022. “Reverse Engineering the Neural Tangent Kernel.”

Xu, Zhang, Li, et al. 2021. “How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.”