We can generalise high school calculus, which is about scalar functions of a scalar argument, in various ways to handle matrix-valued functions or matrix-valued arguments and still look tidy. One could generalise this further, by going to full tensor calculus. But it happens that specifically matrix/vector operations are at a useful point of complexity for lots of algorithms. (I usually want this for higher order gradient descent.)
I mention two convenient and popular formalisms for lazy matrix calculus. In practice a mix of each is often useful.
🏗 I need to return to this and tidy it up with some examples.
A special case of tensor calculus; where the rank of the argument and value of the function is not too big. Fun pain point: agreeing upon layout of derivatives, numerator vs denominator.
If our problem is nice, this often gets us a low-fuss, compact, tidy solution even for some surprising cases where it seems that more general tensors would be more natural —for which, see below.
The rough-and-ready notation is occasionally confusing, but it has a functional analysis interpretation.
Many, many quick recipes: the Matrix Cookbook Petersen and Pedersen (2012)
Brookes’ Matrix Reference Manual
More expository but not as broad, Old and new matrix algebra useful for statistics (Minka 2000)
Alan Edelman’s lectures are v/ pedagogic on matrices and calculus (and go further into random matrix theory)
Indexed tensor calculus
Filed under multilinear algebra.