Julia has an embarrassment of different methods of automatic differentiation (Homoiconicity and introspection makes this comparatively easy.) and it’s not always clear the comparative selling points of each.

The juliadiff project produces ForwardDiff.jl and ReverseDiff.jl which do what I would expect, namely autodiff in forward and reverse mode respectively. ForwardDiff claims to be advanced. ReverseDiff works but is abandoned.

ForwardDiff implements methods to take derivatives, gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functions

In my casual tests it seems to be a slow for my purposes,
due to constantly needing to create
a new closure with a single argument it and differentiate it *all* the time.
Or
maybe I’m doing it wrong, and the compiler will deal with this if I set it up
right?
Or maybe most people are not solving my kind of problems,
e.g. finding many
different optima in similar sub problems.
I suspect this difficulty would vanish
if you were solving one big expensive optimisation with many steps, as with
neural networks.
**update**: I *was* doing it wrong. This gets faster if you avoid type ambiguity
by, e.g setting up your problem in a function.
I’m not sure if there is any remaining overhead in this closure-based system, but it’s
not so bad.
Other needful optimisations might be covered by ForwardDiff2 which rolls in a bunch of commonly needed accelerations/shortcuts.

TODO: Document `NDifferentiable`

, notable `TwiceDifferentiable`

which are a hib covneient way to stipulate desired derivatives for optimisers/life.
For now, see the ML example in `Optim.jl`

In forward mode (desirable when, e.g. I have few parameters with respect to
which I must differentiate), when do I use
DualNumbers.jl?
Probably never; it seems to be deprecated in favour of
a similar system
in ForwardDiff.jl.
But ForwardDiff is well supported.
It seems to be fast for functions with low-dimensional arguments.
It is not clearly documented how one would provide custom derivatives, but apparently you can still
use method extensions for Dual types,
of which there is an example in the issue tracker.
The recommended way
is extending `DiffRules.jl`

which is a little circuitous if you are building
custom functions to interpolate.
It does not seem to support Wirtinger derivatives yet.

Related to this forward differential formalism is Luis Benet and David P. Sanders’ TaylorSeries.jl, which is satisfyingly explicit, and seems to generalise in several unusual directions.

TaylorSeries.jl is an implementation of high-order automatic differentiation, as presented in the book by W. Tucker [-@TuckerValidated2011]. The general idea is the following.

The Taylor series expansion of an analytical function \(f(t)\) with

oneindependent variable \(t\) around \(t_0\) can be written as\[ f(t) = f_0 + f_1 (t-t_0) + f_2 (t-t_0)^2 + \cdots + f_k (t-t_0)^k + \cdots, \] where \(f_0=f(t_0)\), and the Taylor coefficients \(f_k = f_k(t_0)\) are the \(k\)th

normalized derivativesat \(t_0\):\[ f_k = \frac{1}{k!} \frac{{\rm d}^k f} {{\rm d} t^k}(t_0). \]

Thus, computing the high-order derivatives of \(f(t)\) is equivalent to computing its Taylor expansion.… Arithmetic operations involving Taylor series can be expressed as operations on the coefficients.

It has a number of functional-approximation analysis tricks. 🏗

HyperDualNumbers,
promises cheap 2nd order derivatives by generalizing Dual Numbers to HyperDuals.
(ForwardDiff claims to support Hessians by Dual Duals, which are supposed to be
the same as HyperDuals.)
I am curious which is the faster way of generating Hessians out of
`ForwardDiff`

’s Dual-of-Dual and `HyperDualNumbers`

.
`HyperDualNumbers`

has some nice tricks.
Look at the `HyperDualNumbers`

homepage example, where we are
evaluating derivatives of `f`

at `x`

by evaluating it at
`hyper(x, 1.0, 1.0, 0.0)`

.

```
> f(x) = ℯ^x / (sqrt(sin(x)^3 + cos(x)^3))
> t0 = Hyper(1.5, 1.0, 1.0, 0.0)
> y = f(t0)
4.497780053946162 + 4.053427893898621ϵ1 +
4.053427893898621ϵ2 + 9.463073681596601ϵ1ϵ2
```

The first term is the function value, the coefficients of both ϵ1 and ϵ2 (which correspond to the second and third arguments of hyper) are equal to the first derivative, and the coefficient of ϵ1ϵ2 is the second derivative.

*Really* nice. However, AFAICT this method does not actually get you a Hessian,
except in a trivial sense, because it only seems
to return the right answer for scalar functions of scalar arguments.
This is amazing, if you can reduce your function to scalar parameters,
in the sense of having a diagonal Hessian.
But that skips lots of interesting cases.
One useful case it does not skip, if that is so,
is *diagonal* preconditioning of tricky optimisations.

Pro tip: the actual manual is the walk-through which is not linked from the purported manual.

Another curiosity: Benoît Pasquier’s [-@PasquierF1] (F-1 Method) Dual Matrix Tools and Hyper Dual Matrix Tools. which extend this to certain implicit derivatives arising in something or other.

How about
`Zygote.jl`

then?
That’s an alternative AD library from the creators of the aforementioned
`Flux`

.
It usually operates in
reverse mode
and does some zany compilation tricks to get extra fast.
It also has forward mode.
Has many
fancy features including
compiling to Google Cloud TPUs.
Hessian support is “somewhat”.
Flux itself does not yet default to Zygote,
using its own specialised reverse-mode autodiff
`Tracker`

,
but promises to switch transparently to Zygote in the future.
In the interim Zygote is still attractive has many luxurious options,
such as defining optimised custom derivatives easily, as well as weird quirks
such as occasionally bizarre error messages and failures to notice source code
updates.
Chris Racauckas summarises the relationship between some fo these methods.

One could roll one’s own autodiff system using the basic diff definitions in
`DiffRules`

. There is also the very
fancy planned Capstan, which aims to
use a tape system to inject forward and reverse mode differentiation into even
hostile code, and do much more besides.
However it also doesn’t work yet, and depends upon Julia features that also
don’t work yet, so don’t hold your breath. (Or: help them out!)

See also `XGrad`

which does symbolic
differentiation. It prefers to have access to the source code as text rather
than as an AST.
So I think that makes it similar to Zygote, but with worse PR?

## No comments yet. Why not leave one?