Interpretations and tricks for matrix square roots.
There are two types of things which are referred to as matrix square roots of a Hermitian matrix
such that , and such that , when is positive semi-definite.
Sometimes either will do; other times we care about which. Sometimes we want a particular structure, e.g. lower triangular as in the Cholesky decomposition.
1 Almost low-rank roots
Perturbations of nearly-low rank matrices are not themselves nearly low rank in general, but there exist efficient algorithms for finding them at least. Fasi, Higham, and Liu (2023):
We consider the problem of computing the square root of a perturbation of the scaled identity matrix,
, where and are matrices with . This problem arises in various applications, including computer vision and optimization methods for machine learning. We derive a new formula for the th root of that involves a weighted sum of powers of the th root of the matrix . This formula is particularly attractive for the square root, since the sum has just one term when . We also derive a new class of Newton iterations for computing the square root that exploit the low-rank structure.
Their method works for low-rank-plus-diagonal matrices without negative eigenvalues.
Theorem: Let
with and assume that is nonsingular. Let be defined on the spectrum of , and if let be defined at . Then The theorem says two things: that , like , is a perturbation of rank at most of the identity matrix and that can be computed by evaluating and the inverse at two matrices.
I would call this a generalized Woodbury formula, and I think it is pretty cool; which tells you something about my current obsession profile. Anyway, they use it to discover the following:
Let
with have full rank and let the matrix have no eigenvalues on . Then for any integer ,
and in particular
Let
with have full rank and let the matrix have no eigenvalues on . Then
They also derive an explicit gradient step for calculating it, namely “Denman-Beaver iteration”:
The (scaled) DB iteration is
where the positive scaling parameter can be used to accelerate the convergence of the method in its initial steps. The choice yields the unscaled DB method, for which and converge quadratically to and , respectively. … for
the iterates and can be written in the form
This gets us a computational speed up, although of a rather complicated kind. For a start, its constant factor is favourable compared to the naive approach, but it also has a somewhat favourable scaling with
Anyway, let us suppose
The Denman-Beaver step becomes
This looked useful although note that this is giving us a full-size square root.
2 Diagonal-plus-low-rank Cholesky
Louis Tiao, in Efficient Cholesky decomposition of low-rank updates summarises Seeger (2004).