Matrix inverses

August 5, 2014 — September 29, 2023

feature construction
functional analysis
high d
linear algebra
networks
probability
signal processing
sparser than thou
statistics

Assumed audience:

People with undergrad linear algebra

Figure 1: What lives inside the null space of the singular matrix inverse

Matrix inverses. Sometimes we need them. Worse, sometimes we need generalisations of them, inverses of singular (a.k.a. non-invertible) matrices. This page is for notes on that theme.

But first, please fill out this form.

✅ By reading further I acknowledge that a matrix inverse is actually what I want, even though the more common case would be that I would like the solution of a linear system without forming the inverse.

OK, done? let us continue.

1 All possible generalizations of matrix inverses

Many (Rao and Mitra 1973; Ben-Israel and Greville 2003). Nick Higham’s introduction generalized inverses is an easier start.

A beautiful, compact summary is in Searle (2014):

…there are (with one exception) many matrices \(\mathrm{G}\) satisfying \[ A G A=\mathrm{A} \text {, } \] which is condition (i). Each matrix \(\mathrm{G}\) satisfying AGA = A is called a generalized inverse of A, and if it also satisfies \(\mathrm{GAG}=\mathrm{G}\) it is a reflexive generalized inverse. The exception is when \(\mathrm{A}\) is nonsingular: there is then only one \(\mathrm{G}\); namely, \(\mathrm{G}=\mathrm{A}^{-1}\). 8.2 Arbitrariness That there are many matrices \(\mathrm{G}\) can be illustrated by showing ways in which from one \(\mathrm{G}\) others can be obtained. Thus, if A is partitioned as \[ \mathrm{A}=\left[\begin{array}{ll} \mathrm{A}_{11} & \mathrm{A}_{12} \\ \mathrm{A}_{21} & \mathrm{A}_{22} \end{array}\right] \] where \(\mathrm{A}_{11}\) is nonsingular with the same rank as \(\mathrm{A}\), then \[ \mathrm{G}=\left[\begin{array}{lllll} \mathrm{A}_{11}^{-1}-\mathrm{U A}_{21} & \mathrm{A}_{11}^{-1}-\mathrm{A}_{11}^{-1} & \mathrm{A}_{12} \mathrm{V}-\mathrm{A}_{11}^{-1} \mathrm{A}_{12} \mathrm{W A}_{21} \mathrm{A}_{11}^{-1} & \mathrm{U} \\ \mathrm{V} & & \mathrm{W} \end{array}\right] \] is a generalized inverse of \(\mathrm{A}\) for any values of \(\mathrm{U}, \mathrm{V}\) and \(\mathrm{W}\). This can be used to show that a generalized inverse of a symmetric matrix is not necessarily symmetric; and that of a singular matrix is not necessarily singular.

A simpler illustration of arbitrariness is that if \(\mathrm{G}\) is a generalized inverse of \(\mathrm{A}\) then so is \[ \mathrm{G}^*=\mathrm{GAG}+(\mathrm{I}-\mathrm{GA}) \mathrm{S}+\mathrm{T}(\mathrm{I}-\mathrm{AG}), \] for any values of \(\mathrm{S}\) and \(\mathrm{T}\).

2 Moore-Penrose pseudo-inverse

A classic. The “default” generalized inverse.

See Nick Higham’s What Is the Pseudoinverse of a Matrix?

Let us do the conventional thing and mention which properties of the pseudo inverse are shared by the inverse:

The pseudo inverse (or Moore-Penrose inverse) of a matrix \(\mathrm{A}\) is the matrix \(\mathrm{A}^{+}\) that fulfils

  1. \(\mathrm{A A}^{+} \mathrm{A}=\mathrm{A}\)
  2. \(\mathrm{A}^{+} \mathrm{A} \mathrm{A}^{+}=\mathrm{A}^{+}\)
  3. \(\mathrm{A A}^{+}\) symmetric
  4. \(\mathrm{A}^{+} \mathrm{A}\) symmetric

Want more? As always, we copy-paste the omnibus results of Petersen and Pedersen (2012):

Assume \(\mathrm{A}^{+}\)to be the pseudo-inverse of \(\mathrm{A}\), then \[ \begin{aligned} \left(\mathrm{A}^{+}\right)^{+} & =\mathrm{A} \\ \left(\mathrm{A}^\top\right)^{+} & =\left(\mathrm{A}^{+}\right)^\top \\ \left(\mathrm{A}^\dagger\right)^{+} & =\left(\mathrm{A}^{+}\right)^\dagger \\ \left(\mathrm{A}^{+} \mathrm{A}\right) \mathrm{A}^\dagger & =\mathrm{A}^\dagger \\ \left(\mathrm{A}^{+} \mathrm{A}\right) \mathrm{A}^\top & \neq \mathrm{A}^\top \\ (c \mathrm{A})^{+} & =(1 / c) \mathrm{A}^{+} \\ \mathrm{A}^{+} & =\left(\mathrm{A}^\top \mathrm{A}\right)^{+} \mathrm{A}^\top \\ \mathrm{A}^{+} & =\mathrm{A}^\top\left(\mathrm{A} \mathrm{A}^\top\right)^{+} \\ \left(\mathrm{A}^\top \mathrm{A}\right)^{+} & =\mathrm{A}^{+}\left(\mathrm{A}^\top\right)^{+} \\ \left(\mathrm{A} \mathrm{A}^\top\right)^{+} & =\left(\mathrm{A}^\top\right)^{+} \mathrm{A}^{+} \\ \mathrm{A}^{+} & =\left(\mathrm{A}^\dagger \mathrm{A}\right)^{+} \mathrm{A}^\dagger \\ \mathrm{A}^{+} & =\mathrm{A}^\dagger\left(\mathrm{A} \mathrm{A}^\dagger\right)^{+} \\ \left(\mathrm{A}^\dagger \mathrm{A}\right)^{+} & =\mathrm{A}^{+}\left(\mathrm{A}^\dagger\right)^{+} \\ \left(\mathrm{A} \mathrm{A}^\dagger\right)^{+} & =\left(\mathrm{A}^\dagger\right)^{+} \mathrm{A}^{+} \\ (\mathrm{A B})^{+} & =\left(\mathrm{A}^{+} \mathrm{A B}\right)^{+}\left(\mathrm{A B B} \mathrm{B}^{+}\right)^{+} \\ f\left(\mathrm{A}^\dagger \mathrm{A}\right)-f(0) \mathrm{I} & =\mathrm{A}^{+}\left[f\left(\mathrm{A} \mathrm{A}^\dagger\right)-f(0) \mathrm{I}\right] \mathrm{A} \\ f\left(\mathrm{A} \mathrm{A}^\dagger\right)-f(0) \mathrm{I} & =\mathrm{A}^\dagger\left[\left(\mathrm{A}^\dagger \mathrm{A}\right)-f(0) \mathrm{I}\right] \mathrm{A}^{+} \end{aligned} \]

I find definition in terms of these properties totally confusing. Perhaps a better way of thinking about pseudo-inverses is via their action upon vectors. TBC

Consider the Moore-Penrose of \(\mathrm{K}\) which we write \(\mathrm{K}^+\). The famous way of constructing it is by taking a SVD of \(\mathrm{K}=\mathrm{U}_{\mathrm{K}}\mathrm{S}_{\mathrm{K}}\mathrm{V}_{\mathrm{K}}^{\dagger}\) where \(\mathrm{U}_{\mathrm{K}}\) and \(\mathrm{V}_{\mathrm{K}}\) are unitary and \(\mathrm{S}_{\mathrm{K}}\) is diagonal. Then we define \(\mathrm{S}_{\mathrm{K}}^+\) to be the pseudo-inverse of the diagonal matrix of singular values, which we construct by setting all non-zero entries to their own reciprocal, but otherwise leave at 0. We have sneakily decided that pseudo-inverse of a diagonal matrix is easy; we just take the reciprocal of the non-zero entries. This turns out to do the right thing, if you check it, and it does not even sound crazy, but also it is not, to me at least, totally obvious. Next, the pseudo-inverse of the whole thing is \(\mathrm{K}^+=\mathrm{V}_{\mathrm{K}}\mathrm{S}_{\mathrm{K}}^+\mathrm{U}_{\mathrm{K}}^{\dagger}\), we claim. If we check the object we create by this procedure, we discover that it satisfies the above properties. (Homework problem).

Meta note: In general, proving things about pseudo-inverses by the constructive solution given by the SVD is much more compact than via the algebraic properties, as well as more intuitive, at least for me.

There is a cute special-case result for low rank matrices.

3 (Generalized) Bott-Duffin pseudo inverse

Many variants (Wu et al. 2023; Gao et al. 2023; Liu, Wang, and Wei 2009).

4 Drazin inverse

The Drazin inverse is introduced in a mediocre Wikipedia article:

Let \(\mathrm{A}\) be square matrix. The index of \(\mathrm{A}\) is the least nonnegative integer \(k\) such that \(\operatorname{rank}\left(\mathrm{A}^{k+1}\right)=\operatorname{rank}\left(\mathrm{A}^k\right)\). The Drazin inverse of \(\mathrm{A}\) is the unique matrix \(\mathrm{A}^{\mathrm{D}}\) that satisfies \[ \mathrm{A}^{k+1} \mathrm{A}^{\mathrm{D}}=\mathrm{A}^k, \quad \mathrm{A}^{\mathrm{D}} \mathrm{A} \mathrm{A}^{\mathrm{D}}=\mathrm{A}^{\mathrm{D}}, \quad \mathrm{A} \mathrm{A}^{\mathrm{D}}=\mathrm{A}^{\mathrm{D}} \mathrm{A} . \] It’s not a generalized inverse in the classical sense, since \(\mathrm{A} \mathrm{A}^{\mathrm{D}} \mathrm{A} \neq \mathrm{A}\) in general.

  • If \(\mathrm{A}\) is invertible with inverse \(\mathrm{A}^{-1}\), then \(\mathrm{A}^{\mathrm{D}}=\mathrm{A}^{-1}\).
  • If \(\mathrm{A}\) is a block diagonal matrix

\[ \mathrm{A}=\left[\begin{array}{cc} \mathrm{B} & 0 \\ 0 & \mathrm{N} \end{array}\right] \] where \(\mathrm{B}\) is invertible with inverse \(\mathrm{B}^{-1}\) and \(\mathrm{N}\) is a nilpotent matrix, then \[ \mathrm{A}^D=\left[\begin{array}{cc} \mathrm{B}^{-1} & 0 \\ 0 & 0 \end{array}\right] \]

  • Drazin inversion is invariant under conjugation. If \(\mathrm{A}^{\mathrm{D}}\) is the Drazin inverse of \(\mathrm{A}\), then \(P \mathrm{A}^{\mathrm{D}} P^{-1}\) is the Drazin inverse of \(P \mathrm{A} P^{-1}\).
  • The Drazin inverse of a matrix of index 0 or 1 is called the group inverse or \(\{1,2,5\}\)-inverse and denoted \(\mathrm{A}^{\#}\). The group inverse can be defined, equivalently, by the properties \(\mathrm{A} \mathrm{A}^{\#} \mathrm{A}=\mathrm{A}, \mathrm{A}^{\#} \mathrm{A} \mathrm{A}^{\#}=\mathrm{A}^{\#}\), and \(\mathrm{A} \mathrm{A}^{\#}=\mathrm{A}^{\#} \mathrm{A}\).
  • A projection matrix \(P\), defined as a matrix such that \(P^2=P\), has index 1 (or 0 ) and has Drazin inverse \(P^{\mathrm{D}}=P\).
  • If \(\mathrm{\mathrm{A}}\) is a nilpotent matrix (for example a shift matrix), then \(\mathrm{A}^{\mathrm{D}}=0\).

5 Incrementally updating

(Hager 1989; Khan 2008)

6 Low-rank

See low-rank matrices.

7 Nearly-Low-rank

See nearly-low-rank matrices.

8 References

Ameli, and Shadden. 2023. A Singular Woodbury and Pseudo-Determinant Matrix Identities and Application to Gaussian Process Regression.” Applied Mathematics and Computation.
Ben-Israel, and Greville. 2003. Generalized Inverses: Theory and Applications.
Bott, and Duffin. 1953. On the Algebra of Networks.” Transactions of the American Mathematical Society.
Brand. 2006. Fast Low-Rank Modifications of the Thin Singular Value Decomposition.” Linear Algebra and Its Applications, Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems,.
Chow, and Saad. 1997. Approximate Inverse Techniques for Block-Partitioned Matrices.” SIAM Journal on Scientific Computing.
Chung, and Chung. 2014. An Efficient Approach for Computing Optimal Low-Rank Regularized Inverse Matrices.” Inverse Problems.
Cordero, Soto-Quiros, and Torregrosa. 2021. A General Class of Arbitrary Order Iterative Methods for Computing Generalized Inverses.” Applied Mathematics and Computation.
Gao, Zuo, Zou, et al. 2023. New Characterizations and Representations of the Generalized Bott–Duffin Inverse and Its Applications.” Linear and Multilinear Algebra.
Gaul, Kögl, and Wagner. 2003. The Bott-Duffin Inverse.” In Boundary Element Methods for Engineers and Scientists: An Introductory Course with Advanced Topics.
Golub, and Meurant. 2010. Matrices, Moments and Quadrature with Applications.
Hager. 1989. Updating the Inverse of a Matrix.” SIAM Review.
Harville. 1976. Extension of the Gauss-Markov Theorem to Include the Estimation of Random Effects.” The Annals of Statistics.
———. 1977. Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems.” Journal of the American Statistical Association.
Henderson, and Searle. 1981. On Deriving the Inverse of a Sum of Matrices.” SIAM Review.
Kala, and Klaczyński. 1994. Generalized Inverses of a Sum of Matrices.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002).
Khan. 2008. Updating Inverse of a Matrix When a Column Is Added/Removed.”
Liu, Wang, and Wei. 2009. A Generalization of the Bott–Duffin Inverse and Its Applications.” Numerical Linear Algebra with Applications.
Miller. 1981. On the Inverse of the Sum of Matrices.” Mathematics Magazine.
Petersen, and Pedersen. 2012. The Matrix Cookbook.”
Rao, and Mitra. 1973. “Theory and Application of Constrained Inverse of Matrices.” SIAM Journal on Applied Mathematics.
Saad. 2003. Iterative Methods for Sparse Linear Systems: Second Edition.
Searle. 2014. Matrix Algebra.” In Wiley StatsRef: Statistics Reference Online.
Searle, and Khuri. 2017. Matrix Algebra Useful for Statistics.
Sheng. 2012. An Iterative Algorithm to Compute the Bott-Duffin Inverse and Generalized Bott-Duffin Inverse.” Filomat.
Spantini, Cui, Willcox, et al. 2017. Goal-Oriented Optimal Approximations of Bayesian Linear Inverse Problems.” SIAM Journal on Scientific Computing.
Spantini, Solonen, Cui, et al. 2015. Optimal Low-Rank Approximations of Bayesian Linear Inverse Problems.” SIAM Journal on Scientific Computing.
Wu, Zuo, Cvetković-Ilić, et al. 2023. New Characterizations and Representations of the Bott–Duffin Inverse.” Journal of Mathematics.
Yonglin. 1990. The Generalized Bott-Duffin Inverse and Its Applications.” Linear Algebra and Its Applications.