Implementing neural nets
October 14, 2016 — September 5, 2024
1 HOWTOs
The internet is full of guides to training neural nets. Here are some selected highlights.
Michael Nielson has a free online textbook with code examples in python. Christopher Olah’s visual explanations make many things clear.
Andrej’s popular unromantic messy guide to training neural nets in practice has a lot of tips that people tend to rediscover the hard way if they do not get them from him. (I did)
It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. … Unfortunately, neural nets are nothing like that. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier.
Alice’s Adventures in a differentiable wonderland (Scardapane 2024)
Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more. Stripped of anything else, neural networks are compositions of differentiable primitives, and studying them means learning how to program and how to interact with these models, a particular example of what is called differentiable programming.
This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland. I overview the basics of optimizing a function via automatic differentiation, and a selection of the most common designs for handling sequences, graphs, texts, and audios. The focus is on a intuitive, self-contained introduction to the most important design techniques, including convolutional, attentional, and recurrent blocks, hoping to bridge the gap between theory and code (PyTorch and JAX) and leaving the reader capable of understanding some of the most advanced models out there, such as large language models (LLMs) and multimodal architectures.
Dive into Deep Learning (Zhang et al. 2023)
Interactive deep learning book with code, math, and discussions
Implemented with PyTorch, NumPy/MXNet, JAX, and TensorFlow
Adopted at 500 universities from 70 countries
Source code at d2l-ai/d2l-en. They are no longer distributing the book as a PDF, but you can build it yourself
2 Profiling and performance optimisation
Start with general python profilers; many of them have NN affordances now.
- google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.
- Making Deep Learning go Brrrr From First Principles
- Monitor & Improve GPU Usage for Model Training on Weights & Biases
- Tracking system resource (GPU, CPU, etc.) utilization during training with the Weights & Biases Dashboard
- Algorithms for Modern Hardware - Algorithmica
- pytorch profilers
3 NN Software
I have used
- pytorch
- julia
- jax
- Occasionally, reluctantly, Tensorflow
I could use any of the other autodiff systems, such as…
- Theano (Python) (now defunct) was a trailblazer
- Torch (lua) —in practice deprecated in favour of pytorch
- Caffe was popular for a while; have not seen it recently (MATLAB/Python)
- Paddlepaddle is one of Baidu’s NN properties (Python/C++)
- mindspore is Huawei’s framework based on source transformation autodiff, targets interesting edge hardware.
- javascript: see javascript machine learning
3.1 Compiled
See edge ml for a discussion of compiled NNs.
4 Tracking experiments
5 Configuring experiments
See configuring experiments; in practice I use hydra for everything.
6 Pre-computed/trained models
These are all hopelessly outdated now, in the era of huggingface.
Caffe format:
The Caffe Zoo has lots of nice models, pre-trained on their wiki
Here’s a great CV one, Andrej Karpathy’s image captioner, Neuraltalk2
for the NVC dataset: — pre-trained feature model here)
For lasagne: https://github.com/Lasagne/Recipes/tree/master/modelzoo
For Keras:
7 Managing axes
A lot of the time managing deep learning is remembering which axis is which. Practically, I have found Einstein convention to solve all my needs.
However, there are alternatives. Alexander Rush argues for NamedTensor. Implementations:
- Native Pytorch
- namedtensor pytorch
- labeledtensor tensorflow
8 Scaling up
9 Incoming
- lab-ml/nn: 🧠 Implementations/tutorials of deep learning papers with side-by-side notes; including transformers (original, xl, switch, feedback), optimizers(adam, radam, adabelief), gans(dcgan, cyclegan), reinforcement learning (ppo, dqn), capsnet, sketch-rnn, etc.
- labml.ai Neural Networks
- ApplyingML - Papers, Guides, and Interviews with ML practitioners