Successor to Lua’s torch. Evil twin to Googles’s Tensorflow. Intermittently ascendant over Tensorflow for researchers, if not for industrial uses.

⚠️ I did not use pytorch in 2019-2020. Some info may be out of date.

They claim aim for fancy applications such as reversible learning and what-have-you which are easier in pytorch’s dynamic graph construction style, which resembles (in outcome if not implementation details) the dynamic styles of jax, most julia autodiffs, and tensorflow in “eager” mode.

PyTorch has a unique [sic] way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world. One has to build a neural network, and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch. [… Pytorch] allows you to change the way your network behaves arbitrarily with zero lag or overhead.

Of course the overhead is not truly zero; rather they have shifted the overhead baseline down a little. Discounting their hyperbole, it still provides relatively convenient autodiff.

Getting started

An incredible advantage of pytorch is (or was?) its documentation which is clear and consistent and somewhat comprehensive. I may be out of date here, but Tensorflow documentation was not clear, consistent or comprehensive last time I was using it.

DSP in pytorch

Not too bad. Keunwoo Choi has some beautiful examples, e.g. Inverse STFT, Harmonic Percussive separation.

Today we have torchaudio or alternatively nnAudio (Source) which is similar but has fewer dependencies.

Custom functions

There is (was?) some bad advice in the manual:

nn exports two kinds of interfaces — modules and their functional versions. You can extend it in both ways, but we recommend using modules for all kinds of layers, that hold any parameters or buffers, and recommend using a functional form parameter-less operations like activation functions, pooling, etc.

So, important missing information.

If your desired loss is already just a composition of existing functions,you don’t need to define a Function subclass

And: The given options are not a binarism but two things you need to do in concert. A better summary would be:

  • If you need to have a function which is differentiable in a non-trivial way, implement a Function

  • If you need to bundle a Function with some state or differentiable parameters, additionally wrap it in a nn.Module

  • Some people claim you can also create custom layers using plain python functions. However, these don’t work as layers in an nn.Sequential model, so I’m not sure how to take this advice.

It’s just as well it’s easy to roll your own recurrent nets because the default implementations are bad

The default RNN layer is heavily optimised using cuDNN, which is sweet, but for some complicated technical reason I do not give an arse about, only have a choice of 2 activation functions, and neither of them is “linear”.

Ding Ke, for example, made an extensible RNN implementation. These used to be horribly slow. But recent pytorch includes JITed RNN. I have not used it.

Logging and visualizing training

Visualising graphs

Fiddly. The official way is via ONNX.

conda install -c ezyang onnx pydot # or
pip install onnx pydot
brew install --cask netron # or
pip install netron
brew install graphviz

Also available, pytorchviz.

pip install git+
from pytorchviz import make_dot
y = model(x)
make_dot(y, params = dict(model.named_parameters()))

Utility libraries, derived software

Pytorch ships with a lot of included functionality, so you don’t necessarily need to wrap it in anything else. Nonetheless, you can. Here are some frameworks that I have encountered.

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes:

  • A new type dispatch system for Python along with a semantic type hierarchy for tensors
  • A GPU-optimized computer vision library which can be extended in pure Python
  • An optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code
  • A novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training
  • A new data block API


Lightning is (was?) the latest hot hotness in my circles. C&C ignote

Lightning is a very lightweight wrapper on PyTorch that decouples the science code from the engineering code. It’s more of a style-guide than a framework. By refactoring your code, we can automate most of the non-research code.

To use Lightning, simply refactor your research code into the LightningModule format (the science) and Lightning will automate the rest (the engineering). Lightning guarantees tested, correct, modern best practices for the automated parts.

  • If you are a researcher, Lightning is infinitely flexible, you can modify everything down to the way .backward is called or distributed is set up.
  • If you are a scientist or production team, lightning is very simple to use with best practice defaults.

Why do I want to use lightning?

Every research project starts the same, a model, a training loop, validation loop, etc. As your research advances, you’re likely to need distributed training, 16-bit precision, checkpointing, gradient accumulation, etc.

Lightning sets up all the boilerplate state-of-the-art training for you so you can focus on the research.




Like other deep learning frameworks, there is some basic NLP support in pytorch; see pytorch.text.

flair is a commercially-backed NLP framework.


Pump graphs to a visualisation server. No pytorch-specific, but seems well-integrated. See visdom.


There is apparently an interface — barkm/torch-fenics — to pde solve FeniCS.


pytorch + Bayes = pyro, an Edwardlib competitor.

Pyro launch announcment:

We believe the critical ideas to solve AI will come from a joint effort among a worldwide community of people pursuing diverse approaches. By open sourcing Pyro, we hope to encourage the scientific world to collaborate on making AI tools more flexible, open, and easy-to-use. We expect the current (alpha!) version of Pyro will be of most interest to probabilistic modelers who want to leverage large data sets and deep networks, PyTorch users who want easy-to-use Bayesian computation, and data scientists ready to explore the ragged edge of new technology.


pyprob: (Le, Baydin, and Wood 2017)

pyprob is a PyTorch-based library for probabilistic programming and inference compilation. The main focus of this library is on coupling existing simulation codebases with probabilistic inference with minimal intervention.

The main advantage of pyprob, compared against other probabilistic programming languages like Pyro, is a fully automatic amortized inference procedure based on importance sampling. pyprob only requires a generative model to be specified. Particularly, pyprob allows for efficient inference using inference compilation which trains a recurrent neural network as a proposal network.

In Pyro such an inference network requires the user to explicitly define the control flow of the network, which is due to Pyro running the inference network and generative model sequentially. However, in pyprob the generative model and inference network runs concurrently. Thus, the control flow of the model is directly used to train the inference network. This alleviates the need for manually defining its control flow.

The flagship application seems to be etalumis (Baydin et al. 2019) a probabilistic programming framework with emphasis AFAICT on Bayesian inverse problems.


inferno is a grab-bag library for torch.

Current features include:

  • a basic Trainer class to encapsulate the training boilerplate (iteration/epoch loops, validation and checkpoint creation),

  • a graph API for building models with complex architectures, powered by networkx.

  • easy data-parallelism over multiple GPUs,

  • a submodule for torch.nn.Module-level parameter initialization,

  • a submodule for data preprocessing / transforms,

  • [other stuff that is not especially useful to do with a library]

I’m not sold on this one; A whole new library to reduce an already small amount of boilerplate, without adding any new non-trivial capabilities.


Kornia is a computer vision library for pytorch. It includes such niceties as differentiable image warping via the grid_sample thing.


TNT is a reimplementation of some lua library that lua torch users used, that the current generation of ML users never witnessed. I think it aims to be semi-official library for pytorch, but it’s not especially active.

TNT (imported as torchnet) is a framework for PyTorch which provides a set of abstractions for PyTorch aiming at encouraging code re-use as well as encouraging modular programming. It provides powerful dataloading, logging, and visualization utilities. […]

For example, TNT provides simple methods to record model performance in the torchnet.meter module and to log them to Visdom (or in the future, TensorboardX) with the torchnet.logging.

TNT docs.

Sparse matrixes



Memory leaks

Apparently you use normal python garbage collector analysis.

import torch
import gc
for obj in gc.get_objects():
        if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(
            print(type(obj), obj.size())
    except Exception as e:

See also usual python debugging.


Baydin, Atılım Güneş, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, et al. 2019. “Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.” In.
Cheuk, Kin Wai, Kat Agres, and Dorien Herremans. 2019. nnAUDIO: A Pytorch Audio Processing Tool Using 1d Convolution Neural Networks,” 2.
Le, Tuan Anh, Atılım Güneş Baydin, and Frank Wood. 2017. “Inference Compilation and Universal Probabilistic Programming.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54:1338–48. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.