Automatic differentiation in Julia

July 27, 2016 — August 9, 2022

algebra
computers are awful
functional analysis
julia
linear algebra
number crunching
optimization

Assumed audience:

People trying who have done autodiff-based computation in Julia but need some extra special features

Content warning:

Has not been updated recently

Julia has an embarrassment of different methods of automatic differentiation (Homoiconicity and introspection makes this comparatively easy.) and it’s not always clear the comparative selling points of each. Here are some, largely outdate, notes that I made last time I needed to do something fancy with autodiff (specifically, differentiate through fourier interpolation.

The details of the autodiff tools are, for many common uses, abstracted away into the libraries that depend upon them. See, for example, the many examples in the SciML ecosystem or, Chris Rackauckas’ excellent blog posts such as Generalizing Automatic Differentiation to Automatic Sparsity, Uncertainty, Stability, and Parallelism or this summary post. ML systems that use this, such as Zygote.jl provide their own implementations or offer pluggable options. If I wanted to introduce backend-agnostic autodiff into my own julia functions I would use JuliaDiff/ChainRulesCore.jl these days.

AD-backend agnostic system defining custom forward and reverse mode rules. This is the light weight core to allow you to define rules for your functions in your packages, without depending on any particular AD system.

Back in the day, I needed to differentiate some complicated things manually for earlier versions of julia and that involved auditioning lots of candidates for that from unusual perspectives. I would not need to do that now and the information is accordingly dated. Anyway, still some useful links so, here are some of those options, circa 2019.

Figure 1

The juliadiff project produces ForwardDiff.jl and ReverseDiff.jl which do what I would expect, namely autodiff in forward and reverse mode respectively.

ForwardDiff implements methods to take derivatives, gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functions

Also,

ReverseDiff is a fast and compile-able tape-based reverse mode automatic differentiation (AD) that implements methods to take gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functions (or any callable object, really).

While performance can vary depending on the functions you evaluate, the algorithms implemented by ReverseDiff generally outperform non-AD algorithms in both speed and accuracy.

To get maximum speed out of this we need to avoid type ambiguity by, e.g setting up the problem in a function where the arguments’ types may be inferred. Other needful optimisations might be covered by ForwardDiff2 which rolls in a bunch of commonly needed accelerations/shortcuts.

TODO: Document NDifferentiable, noting TwiceDifferentiable, which seems a convenient way to stipulate desired (higher?) derivatives for optimisers/life. For now, see the ML example in Optim.jl

In forward mode (desirable when, e.g. I have few parameters with respect to which I must differentiate), when do I use DualNumbers.jl? Probably never; it seems to be deprecated in favour of a similar system in ForwardDiff.jl. It seems to be fast for functions with low-dimensional arguments. It is not clearly documented how one would provide custom derivatives, but apparently you can still use method extensions for Dual types, of which there is an example in the issue tracker. The recommended way is extending DiffRules.jl which seems a little circuitous if you are building custom functions to interpolate. I think it is via JuliaDiff/ChainRules.jl: forward and reverse mode automatic differentiation primitives for Julia Base + StdLibs now?

It does not seem to support Wirtinger derivatives yet.

Related to this forward differential formalism is Luis Benet and David P. Sanders’ TaylorSeries.jl, which is satisfyingly explicit, and seems to generalise in several unusual directions, in particular to high order derivatives.

It has a number of functional-approximation analysis tricks. 🏗

HyperDualNumbers, promises cheap 2nd order derivatives by generalizing Dual Numbers to HyperDuals. (ForwardDiff claims to support Hessians by Dual Duals, which are supposed to be the same as HyperDuals.) I am curious which is the faster way of generating Hessians out of ForwardDiff’s Dual-of-Dual and HyperDualNumbers. HyperDualNumbers supports neat tricks. Look at the HyperDualNumbers homepage example, where we are evaluating derivatives of f at x by evaluating it at hyper(x, 1.0, 1.0, 0.0).

> f(x) = ^x / (sqrt(sin(x)^3 + cos(x)^3))
> t0 = Hyper(1.5, 1.0, 1.0, 0.0)
> y = f(t0)
4.497780053946162 + 4.053427893898621ϵ1 +
  4.053427893898621ϵ2 + 9.463073681596601ϵ1ϵ2

The first term is the function value, the coefficients of both ϵ1 and ϵ2 (which correspond to the second and third arguments of hyper) are equal to the first derivative, and the coefficient of ϵ1ϵ2 is the second derivative.

Really nice. However, AFAICT this method does not actually get you a Hessian, except in a trivial sense, because it only seems to return the right answer for scalar functions of scalar arguments. This is amazing, if you can reduce your function to scalar parameters, in the sense of having a diagonal Hessian. But that skips lots of interesting cases. One useful case it does not skip, if that is so, is diagonal preconditioning of tricky optimisations.

Pro tip: the actual manual is the walk-through which is not linked from the purported manual.

Another curiosity: Benoît Pasquier’s (F-1 Method) Dual Matrix Tools and Hyper Dual Matrix Tools. which extend this to certain implicit derivatives arising in something or other.

How about Zygote.jl then? That’s an alternative AD library from the creators of the aforementioned Flux. It usually operates in reverse mode and does some zany compilation tricks to get extra fast. It also has forward mode. Has many fancy features including compiling to Google Cloud TPUs. Hessian support is “somewhat”. In the interim Zygote is still attractive has many luxurious options, such as defining optimised custom derivatives easily, as well as weird quirks such as occasionally bizarre error messages and failures to notice source code updates. Chris Rackauckas summarises the relationship between some of these methods.

One could roll one’s own autodiff system using the basic diff definitions in DiffRules. There is also the very fancy planned Capstan, which aims to use a tape system to inject forward and reverse mode differentiation into even hostile code, and do much more besides. However it also doesn’t work yet, and depends upon Julia features that also don’t work yet, so don’t hold your breath. (Or: help them out!) JuliaDiff/Diffractor.jl: Next-generation AD is the new bleeding-edge one.

See also XGrad which does symbolic differentiation. It prefers to have access to the source code as text rather than as an AST. So I think that makes it similar to Zygote, but with worse PR?

ADCME targets functional and inverse problems using a tensorflow backend, which is a startling choice but OK.

1 Incoming

2 References

Fischer, and Saba. 2018. Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs.” arXiv:1810.09868 [Cs, Stat].
McNicholas, and Tait. 2019. Data Science With Julia.
Mogensen, and Riseth. 2018. Optim: A Mathematical Optimization Package for Julia.” Journal of Open Source Software.
Rackauckas. 2019. The Essential Tools of Scientific Machine Learning (Scientific ML).”
Xu, and Darve. 2020. ADCME: Learning Spatially-Varying Physical Fields Using Deep Neural Networks.” In arXiv:2011.11955 [Cs, Math].