Matrix calculus

We can generalise high school calculus, which is about scalar functions of a scalar argument, in various ways to handle matrix-valued functions or matrix-valued arguments and still look tidy. One could generalise this further, by going to full tensor calculus. But it happens that specifically matrix/vector operations are at a useful point of complexity for lots of algorithms. (I usually want this for higher order gradient descent.)

I mention two convenient and popular formalisms for lazy matrix calculus. In practice a mix of each is often useful.

Matrix differentials

🏗 I need to return to this and tidy it up with some examples.

A special case of tensor calculus; where the rank of the argument and value of the function is not too big. Fun pain point: agreeing upon layout of derivatives, numerator vs denominator.

If our problem is nice, this often gets us a low-fuss, compact, tidy solution even for some surprising cases where it seems that more general tensors would be more natural —for which, see below.

Indexed tensor calculus

Filed under multilinear algebra.


Giles, Mike B. 2008. Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation.” In Advances in Automatic Differentiation, edited by Christian H. Bischof, H. Martin Bücker, Paul Hovland, Uwe Naumann, and Jean Utke, 64:35–44. Berlin, Heidelberg: Springer Berlin Heidelberg.
Golub, Gene H., and Charles F van Loan. 1983. Matrix Computations. JHU Press.
Graham, Alexander. 1981. Kronecker Products and Matrix Calculus: With Applications. Horwood.
Gupta, A. K., and D. K. Nagar. 1999. Matrix Variate Distributions. Chapman & Hall/CRC Monographs and Surveys in Pure and Applied Mathematics 104. Boca Raton: Chapman and Hall/CRC.
Ionescu, Catalin, Orestis Vantzos, and Cristian Sminchisescu. 2016. Training Deep Networks with Structured Layers by Matrix Backpropagation.” arXiv.
Laue, Soeren, Matthias Mitterreiter, and Joachim Giesen. 2018. Computing Higher Order Derivatives of Matrix and Tensor Expressions.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 2750–59. Curran Associates, Inc.
Magnus, Jan R., and Heinz Neudecker. 2019. Matrix differential calculus with applications in statistics and econometrics. 3rd ed. Wiley series in probability and statistics. Hoboken (N.J.): Wiley.
Minka, Thomas P. 2000. Old and new matrix algebra useful for statistics.
Parr, Terence, and Jeremy Howard. 2018. The Matrix Calculus You Need For Deep Learning.”
Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012. The Matrix Cookbook.”
Searle, Shayle R. 2014. Matrix Algebra.” In Wiley StatsRef: Statistics Reference Online. American Cancer Society.
Searle, Shayle R., and Andre I. Khuri. 2017. Matrix Algebra Useful for Statistics. John Wiley & Sons.
Seber, George A. F. 2007. A Matrix Handbook for Statisticians. Wiley.
Simoncini, V. 2016. Computational Methods for Linear Matrix Equations.” SIAM Review 58 (3): 377–441.
Steeb, Willi-Hans. 2006. Problems and Solutions in Introductory and Advanced Matrix Calculus. World Scientific.
Turkington, Darrell A. 2002. Matrix Calculus and Zero-One Matrices: Statistical and Econometric Applications. Cambridge ; New York: Cambridge University Press.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.