We can generalise the high school calculus, which is about scalar functions of a scalar argument, in various ways, to handle matrix-valued functions or matrix-valued arguments. One could generalise this further, by to full tensor calculus. But it happens that specifically matrix/vector operations are at a useful point of complexity for lots of algorithms, kind of a MVP. (I usually want this for higher order gradient descent.)
I will mention two convenient and popular formalisms for doing that here. In practice a mix of each is often useful.
🏗 I need to return to this and tidy it up with some examples.
A special case of tensor calculus that happens to be handy for some common cases; where the rank of the argument and value of the function is not too big. Fun pain point: agreeing upon layout of derivatives, numerator vs denominator.
If our problem is nice, this often gets us a low-fuss, compact, tidy solution even for some surprising cases where it seems that more general tensors would be more natural. (for which, see below)
- The Matrix Calculus You Need For Deep Learning (Parr and Howard 2018)
- Many, many quick recipes: the Matrix Cookbook Petersen and Pedersen (2012)
- More expository but not as broad, Old and new matrix algebra useful for statistics (Minka 2013)
- Mike Brookes’ Matrix Reference Manual
- autodiff-focussed: Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation (Giles 2008)
- The rough-and-ready notation is occasionally confusing, but it has a functional analysis interpretation.
Indexed tensor calculus
Filed under multilinear algebra.