We can generalise the high school calculus, which is about scalar functions of a scalar argument, in various ways, to handle matrix-valued functions or matrix-valued arguments. One could generalise this further, by to full tensor calculus. But it happens that specifically matrix/vector operations are at a useful point of complexity for lots of algorithms, kind of a MVP. (I usually want this for higher order gradient descent.)

I will mention two convenient and popular formalisms for doing that here. In practice a mix of each is often useful.

## Matrix differentials

🏗 I need to return to this and tidy it up with some examples.

A special case of tensor calculus that happens to be handy for some common cases; where the rank of the argument and value of the function is not too big. Fun pain point: agreeing upon layout of derivatives, numerator vs denominator.

If our problem is nice, this often gets us a low-fuss, compact, tidy solution even for some surprising cases where it seems that more general tensors would be more natural. (for which, see below)

- The Matrix Calculus You Need For Deep Learning (Parr and Howard 2018)
- Many, many quick recipes: the Matrix Cookbook Petersen and Pedersen (2012)
- More expository but not as broad, Old and new matrix algebra useful for statistics (Minka 2013)
- Mike Brookes’ Matrix Reference Manual
- autodiff-focussed: Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation (Giles 2008)
- The rough-and-ready notation is occasionally confusing, but it has a functional analysis interpretation.

## Indexed tensor calculus

Keywords: Ricci calculus, Einstein summation notation, index notation, subscript notation.

If we crack open a tensor textbook we get a lot of guff about general relativity and tensor fields and such, which is all very nice but not germane to typical machine learning applications. We want to start with the immediately-needed thing, which is some tidy notation conventions for dealing with multilinear operations without too many squiggles in our notation.

Soeren Laue, Matthias Mitterreiter, Joachim Giesen and Jens K. Mueller have been popularising such an approach recently. In their paper [LaueComputing2018], they argue that derivation of matrix differential results results can be greatly simplified with Ricci calculus, and P.S. it often induces faster code.

They have a website. MatrixCalculus.org which showcases this trick to do symbolic matrix calculus online (not the accelerated code generation bit.)

Here’s tasty readings on relevant bits of tensor machinery.

Jeremy Kun:

Kees Dullemond & Kasper Peeters, Introduction to Tensor Calculus

Ilan Ben-Yaacov and Francesc Roig, Index Notation for Vector Calculus

J.Pearson, Index Notation

John Crimaldi, A Primer on Index Notation

## References

*Advances in Automatic Differentiation*, edited by Christian H. Bischof, H. Martin Bücker, Paul Hovland, Uwe Naumann, and Jean Utke, 64:35–44. Berlin, Heidelberg: Springer Berlin Heidelberg. http://eprints.maths.ox.ac.uk/1079/.

*Matrix Computations*. JHU Press.

*Kronecker Products and Matrix Calculus: With Applications*. Horwood.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 2750–59. Curran Associates, Inc. http://papers.nips.cc/paper/7540-computing-higher-order-derivatives-of-matrix-and-tensor-expressions.pdf.

*Matrix Differential Calculus with Applications in Statistics and Econometrics*. Rev. ed. New York: John Wiley. http://www.janmagnus.nl/misc/mdc2007-3rdedition.

*Old and New Matrix Algebra Useful for Statistics*.

*A Matrix Handbook for Statisticians*. Wiley.

*Problems and Solutions in Introductory and Advanced Matrix Calculus*. World Scientific.

*Matrix Calculus Zero-One Matrices*. Cambridge University Press.