The internet is full of guides to training neural nets. Here are some selected highlights.
Andrej’s popular unromantic messy guide to training neural nets in practice has a lot of tips that people tend to rediscover the hard way if they do not get them from him. (I did)
It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. … Unfortunately, neural nets are nothing like that. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier.
Profiling and performance optimisation
- google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.
- Making Deep Learning go Brrrr From First Principles
- Monitor & Improve GPU Usage for Model Training on Weights & Biases
- Tracking system resource (GPU, CPU, etc.) utilization during training with the Weights & Biases Dashboard
- Algorithms for Modern Hardware - Algorithmica
- pytorch profilers
I have used
I could use any of the other autodiff systems, such as…
- Intel’s ngraph, which compiles neural nets esp for CPUs
- Collaboratively build, visualize, and design neural nets in browser
- Theano (Python) (now defunct) was a trailblazer
- Torch (lua) —in practice deprecated in favour of pytorch
- Caffe was popular for a while; have not seen it recently (MATLAB/Python)
- Paddlepaddle is one of Baidu’s NN properties (Python/C++)
- mindspore is Huawei’s framework based on source transformation autodiff, targets interesting edge hardware.
- Minimalist tiny-dnn is a C++11 implementation of certain tools for deep learning. It is targets deep learning on limited-compute, embedded systems and IoT devices.
- julia: Various autodiff and full-service ML tools.
A lot of the time managing deep learning is remembering which axis is which. Practically, I have found Einstein convention to solve all my needs.
However, there are alternatives. Alexander Rush argues for NamedTensor. Implementations: