Matrix inverses



Assumed audience:

People with undergrad linear algebra

What lives inside the null space of the singular matrix inverse

Matrix inverses. Sometimes we need them. Worse, sometimes we need generalisations of them, inverses of singular (a.k.a. non-invertible) matrices. This page is for notes on that theme.

All possible generalizations of matrix inverses

Many (Rao and Mitra 1973; Ben-Israel and Greville 2003). Nick Higham’s introduction generalized inverses is an easier start.

A beautiful, compact summary is in Searle (2014):

…there are (with one exception) many matrices \(\mathrm{G}\) satisfying \[ A G A=\mathrm{A} \text {, } \] which is condition (i). Each matrix \(\mathrm{G}\) satisfying AGA = A is called a generalized inverse of A, and if it also satisfies \(\mathrm{GAG}=\mathrm{G}\) it is a reflexive generalized inverse. The exception is when \(\mathrm{A}\) is nonsingular: there is then only one \(\mathrm{G}\); namely, \(\mathrm{G}=\mathrm{A}^{-1}\). 8.2 Arbitrariness That there are many matrices \(\mathrm{G}\) can be illustrated by showing ways in which from one \(\mathrm{G}\) others can be obtained. Thus, if A is partitioned as \[ \mathrm{A}=\left[\begin{array}{ll} \mathrm{A}_{11} & \mathrm{A}_{12} \\ \mathrm{A}_{21} & \mathrm{A}_{22} \end{array}\right] \] where \(\mathrm{A}_{11}\) is nonsingular with the same rank as \(\mathrm{A}\), then \[ \mathrm{G}=\left[\begin{array}{lllll} \mathrm{A}_{11}^{-1}-\mathrm{U A}_{21} & \mathrm{A}_{11}^{-1}-\mathrm{A}_{11}^{-1} & \mathrm{A}_{12} \mathrm{V}-\mathrm{A}_{11}^{-1} \mathrm{A}_{12} \mathrm{W A}_{21} \mathrm{A}_{11}^{-1} & \mathrm{U} \\ \mathrm{V} & & \mathrm{W} \end{array}\right] \] is a generalized inverse of \(\mathrm{A}\) for any values of \(\mathrm{U}, \mathrm{V}\) and \(\mathrm{W}\). This can be used to show that a generalized inverse of a symmetric matrix is not necessarily symmetric; and that of a singular matrix is not necessarily singular.

A simpler illustration of arbitrariness is that if \(\mathrm{G}\) is a generalized inverse of \(\mathrm{A}\) then so is \[ \mathrm{G}^*=\mathrm{GAG}+(\mathrm{I}-\mathrm{GA}) \mathrm{S}+\mathrm{T}(\mathrm{I}-\mathrm{AG}), \] for any values of \(\mathrm{S}\) and \(\mathrm{T}\).

Moore-Penrose pseudo-inverse

A classic. The “default” generalized inverse.

See Nick Higham’s What Is the Pseudoinverse of a Matrix?

Let us do the conventional thing and mention which properties of the pseudo inverse are shared by the inverse:

The pseudo inverse (or Moore-Penrose inverse) of a matrix \(\mathrm{A}\) is the matrix \(\mathrm{A}^{+}\) that fulfils

  1. \(\mathrm{A A}^{+} \mathrm{A}=\mathrm{A}\)
  2. \(\mathrm{A}^{+} \mathrm{A} \mathrm{A}^{+}=\mathrm{A}^{+}\)
  3. \(\mathrm{A A}^{+}\) symmetric
  4. \(\mathrm{A}^{+} \mathrm{A}\) symmetric

Want more? As always, we copy-paste the omnibus results of Petersen and Pedersen (2012):

Assume \(\mathrm{A}^{+}\)to be the pseudo-inverse of \(\mathrm{A}\), then \[ \begin{aligned} \left(\mathrm{A}^{+}\right)^{+} & =\mathrm{A} \\ \left(\mathrm{A}^\top\right)^{+} & =\left(\mathrm{A}^{+}\right)^\top \\ \left(\mathrm{A}^\dagger\right)^{+} & =\left(\mathrm{A}^{+}\right)^\dagger \\ \left(\mathrm{A}^{+} \mathrm{A}\right) \mathrm{A}^\dagger & =\mathrm{A}^\dagger \\ \left(\mathrm{A}^{+} \mathrm{A}\right) \mathrm{A}^\top & \neq \mathrm{A}^\top \\ (c \mathrm{A})^{+} & =(1 / c) \mathrm{A}^{+} \\ \mathrm{A}^{+} & =\left(\mathrm{A}^\top \mathrm{A}\right)^{+} \mathrm{A}^\top \\ \mathrm{A}^{+} & =\mathrm{A}^\top\left(\mathrm{A} \mathrm{A}^\top\right)^{+} \\ \left(\mathrm{A}^\top \mathrm{A}\right)^{+} & =\mathrm{A}^{+}\left(\mathrm{A}^\top\right)^{+} \\ \left(\mathrm{A} \mathrm{A}^\top\right)^{+} & =\left(\mathrm{A}^\top\right)^{+} \mathrm{A}^{+} \\ \mathrm{A}^{+} & =\left(\mathrm{A}^\dagger \mathrm{A}\right)^{+} \mathrm{A}^\dagger \\ \mathrm{A}^{+} & =\mathrm{A}^\dagger\left(\mathrm{A} \mathrm{A}^\dagger\right)^{+} \\ \left(\mathrm{A}^\dagger \mathrm{A}\right)^{+} & =\mathrm{A}^{+}\left(\mathrm{A}^\dagger\right)^{+} \\ \left(\mathrm{A} \mathrm{A}^\dagger\right)^{+} & =\left(\mathrm{A}^\dagger\right)^{+} \mathrm{A}^{+} \\ (\mathrm{A B})^{+} & =\left(\mathrm{A}^{+} \mathrm{A B}\right)^{+}\left(\mathrm{A B B} \mathrm{B}^{+}\right)^{+} \\ f\left(\mathrm{A}^\dagger \mathrm{A}\right)-f(0) \mathrm{I} & =\mathrm{A}^{+}\left[f\left(\mathrm{A} \mathrm{A}^\dagger\right)-f(0) \mathrm{I}\right] \mathrm{A} \\ f\left(\mathrm{A} \mathrm{A}^\dagger\right)-f(0) \mathrm{I} & =\mathrm{A}^\dagger\left[\left(\mathrm{A}^\dagger \mathrm{A}\right)-f(0) \mathrm{I}\right] \mathrm{A}^{+} \end{aligned} \]

I find definition in terms of these properties totally confusing. Perhaps a better way of thinking about pseudo-inverses is via their action upon vectors. TBC

Consider the Moore-Penrose of \(\mathrm{K}\) which we write \(\mathrm{K}^+\). The famous way of constructing it is by taking a SVD of \(\mathrm{K}=\mathrm{U}_{\mathrm{K}}\mathrm{S}_{\mathrm{K}}\mathrm{V}_{\mathrm{K}}^{\dagger}\) where \(\mathrm{U}_{\mathrm{K}}\) and \(\mathrm{V}_{\mathrm{K}}\) are unitary and \(\mathrm{S}_{\mathrm{K}}\) is diagonal. Then we define \(\mathrm{S}_{\mathrm{K}}^+\) to be the pseudo-inverse of the diagonal matrix of singular values, which we construct by setting all non-zero entries to their own reciprocal, but otherwise leave at 0. We have sneakily decided that pseudo-inverse of a diagonal matrix is easy; we just take the reciprocal of the non-zero entries. This turns out to do the right thing, if you check it, and it does not even sound crazy, but also it is not, to me at least, totally obvious. Next, the pseudo-inverse of the whole thing is \(\mathrm{K}^+=\mathrm{V}_{\mathrm{K}}\mathrm{S}_{\mathrm{K}}^+\mathrm{U}_{\mathrm{K}}^{\dagger}\), we claim. If we check the object we create by this procedure, we discover that it satisfies the above properties. (Homework problem).

Meta note: In general, proving things about pseudo-inverses by the constructive solution given by the SVD is much more compact than via the algebraic properties, as well as more intuitive, at least for me.

There is a cute special-case result for low rank matrices.

Drazin inverse

The Drazin inverse is introduced in a mediocre Wikipedia article:

Let \(\mathrm{A}\) be square matrix. The index of \(\mathrm{A}\) is the least nonnegative integer \(k\) such that \(\operatorname{rank}\left(\mathrm{A}^{k+1}\right)=\operatorname{rank}\left(\mathrm{A}^k\right)\). The Drazin inverse of \(\mathrm{A}\) is the unique matrix \(\mathrm{A}^{\mathrm{D}}\) that satisfies \[ \mathrm{A}^{k+1} \mathrm{A}^{\mathrm{D}}=\mathrm{A}^k, \quad \mathrm{A}^{\mathrm{D}} \mathrm{A} \mathrm{A}^{\mathrm{D}}=\mathrm{A}^{\mathrm{D}}, \quad \mathrm{A} \mathrm{A}^{\mathrm{D}}=\mathrm{A}^{\mathrm{D}} \mathrm{A} . \] It’s not a generalized inverse in the classical sense, since \(\mathrm{A} \mathrm{A}^{\mathrm{D}} \mathrm{A} \neq \mathrm{A}\) in general.

  • If \(\mathrm{A}\) is invertible with inverse \(\mathrm{A}^{-1}\), then \(\mathrm{A}^{\mathrm{D}}=\mathrm{A}^{-1}\).
  • If \(\mathrm{A}\) is a block diagonal matrix

\[ \mathrm{A}=\left[\begin{array}{cc} \mathrm{B} & 0 \\ 0 & \mathrm{N} \end{array}\right] \] where \(\mathrm{B}\) is invertible with inverse \(\mathrm{B}^{-1}\) and \(\mathrm{N}\) is a nilpotent matrix, then \[ \mathrm{A}^D=\left[\begin{array}{cc} \mathrm{B}^{-1} & 0 \\ 0 & 0 \end{array}\right] \]

  • Drazin inversion is invariant under conjugation. If \(\mathrm{A}^{\mathrm{D}}\) is the Drazin inverse of \(\mathrm{A}\), then \(P \mathrm{A}^{\mathrm{D}} P^{-1}\) is the Drazin inverse of \(P \mathrm{A} P^{-1}\).
  • The Drazin inverse of a matrix of index 0 or 1 is called the group inverse or \(\{1,2,5\}\)-inverse and denoted \(\mathrm{A}^{\#}\). The group inverse can be defined, equivalently, by the properties \(\mathrm{A} \mathrm{A}^{\#} \mathrm{A}=\mathrm{A}, \mathrm{A}^{\#} \mathrm{A} \mathrm{A}^{\#}=\mathrm{A}^{\#}\), and \(\mathrm{A} \mathrm{A}^{\#}=\mathrm{A}^{\#} \mathrm{A}\).
  • A projection matrix \(P\), defined as a matrix such that \(P^2=P\), has index 1 (or 0 ) and has Drazin inverse \(P^{\mathrm{D}}=P\).
  • If \(\mathrm{\mathrm{A}}\) is a nilpotent matrix (for example a shift matrix), then \(\mathrm{A}^{\mathrm{D}}=0\).

Incrementally updating

(Hager 1989; Khan 2008)

Low-rank

See low-rank matrices.

Nearly-Low-rank

See nearly-low-rank matrices.

References

Ameli, Siavash, and Shawn C. Shadden. 2023. A Singular Woodbury and Pseudo-Determinant Matrix Identities and Application to Gaussian Process Regression.” Applied Mathematics and Computation 452 (April): 128032.
Ben-Israel, Adi, and Thomas N. E. Greville. 2003. Generalized Inverses: Theory and Applications. Springer Science & Business Media.
Bott, R., and R. J. Duffin. 1953. On the Algebra of Networks.” Transactions of the American Mathematical Society 74 (1): 99–109.
Brand, Matthew. 2006. Fast Low-Rank Modifications of the Thin Singular Value Decomposition.” Linear Algebra and Its Applications, Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems, 415 (1): 20–30.
Gao, Jiale, Kezheng Zuo, Honglin Zou, and Zhimei Fu. 2023. New Characterizations and Representations of the Generalized Bott–Duffin Inverse and Its Applications.” Linear and Multilinear Algebra 71 (6): 1026–43.
Gaul, Lothar, Martin Kögl, and Marcus Wagner. 2003. The Bott-Duffin Inverse.” In Boundary Element Methods for Engineers and Scientists: An Introductory Course with Advanced Topics, edited by Lothar Gaul, Martin Kögl, and Marcus Wagner, 461–62. Berlin, Heidelberg: Springer.
Golub, Gene H., and Gérard Meurant. 2010. Matrices, Moments and Quadrature with Applications. USA: Princeton University Press.
Hager, William W. 1989. Updating the Inverse of a Matrix.” SIAM Review 31 (2): 221–39.
Harville, David A. 1976. Extension of the Gauss-Markov Theorem to Include the Estimation of Random Effects.” The Annals of Statistics 4 (2): 384–95.
———. 1977. Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems.” Journal of the American Statistical Association 72 (358): 320–38.
Henderson, H. V., and S. R. Searle. 1981. On Deriving the Inverse of a Sum of Matrices.” SIAM Review 23 (1): 53–60.
Kala, Radoslaw, and Krzysztof Klaczyński. 1994. Generalized Inverses of a Sum of Matrices.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 56 (3): 458–64.
Khan, Emtiyaz. 2008. Updating Inverse of a Matrix When a Column Is Added/Removed.” Removed.
Liu, Xin-Guo, Wei-Guo Wang, and Yi-Min Wei. 2009. A Generalization of the Bott–Duffin Inverse and Its Applications.” Numerical Linear Algebra with Applications 16 (3): 173–96.
Miller, Kenneth S. 1981. On the Inverse of the Sum of Matrices.” Mathematics Magazine 54 (2): 67–72.
Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012. The Matrix Cookbook.”
Rao, C. Radhakrishna, and Sujit Kumar Mitra. 1973. “Theory and Application of Constrained Inverse of Matrices.” SIAM Journal on Applied Mathematics 24 (4): 473–88.
Searle, Shayle R. 2014. Matrix Algebra.” In Wiley StatsRef: Statistics Reference Online. American Cancer Society.
Searle, Shayle R., and Andre I. Khuri. 2017. Matrix Algebra Useful for Statistics. John Wiley & Sons.
Sheng, Xingping. 2012. An Iterative Algorithm to Compute the Bott-Duffin Inverse and Generalized Bott-Duffin Inverse.” Filomat 26 (4): 769–76.
Wu, Jiabao, Kezheng Zuo, Dragana S. Cvetković-Ilić, and Honglin Zou. 2023. New Characterizations and Representations of the Bott–Duffin Inverse.” Journal of Mathematics 2023 (August): e7623837.
Yonglin, Chen. 1990. The Generalized Bott-Duffin Inverse and Its Applications.” Linear Algebra and Its Applications 134 (June): 71–91.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.