Orthonormal and unitary matrices

Energy preserving operators, generalized rotations

In which I think about parameterisations and implementations of finite dimensional energy-preserving operators, a.k.a. matrices. A particular nook in the linear feedback process library, closely related to stability in linear dynamical systems, since every orthonormal matrix is the forward operator of an energy-preserving system, which is an edge case for certain natural types of stability. Also important in random low-dimensional projections.

Uses include maintaining stable gradients in recurrent neural networks (Arjovsky, Shah, and Bengio 2016; Jing et al. 2017; Mhammedi et al. 2017) and efficient normalising flows. (Berg et al. 2018; Hasenclever, Tomczak, and Welling 2017)

Also, parameterising stable Multi-Input-Multi-Output (MIMO) delay networks in signal processing.

There is some terminological work.

Some writers refer to orthogonal matrices (but I prefer that to mean matrices where the columns are not all length 1), and some refer to unitary matrices, which seems to imply the matrix is over the complex field instead of the reals but is basically the same from my perspective.

Finding an orthonormal matrix is equivalent to choosing a finite orthonormal basis, so any way we can parameterise such a basis gives us an orthonormal matrix.

We also might want to consider the implied manifolds of these object, the Stiefel manifold. Formally, the Stiefel manifold \(V_{k, m}\) is the space of \(k\) frames in the \(m\) -dimensional real Euclidean space \(R^{m},\) represented by the set of \(m \times k\) matrices \(X\) such that \(X^{\prime} X=I_{k},\) where \(I_{k}\) is the \(k \times k\) identity matrix. Usually my purposes are served here by \(k=m\). There are some interesting cases in low dimensional projections served by \(k<m\) including \(k=1.\)


Take the QR decomposition

HT Russell Tsuchida for pointing out that the QR decomposition by construction gives you an orthonormal matrix for any square matrix. The construction by Householder reflections is, Wikipedia reckons, \(\mathcal{O}(n^3)\) multiplications for an \(n\times n\) matrix.

I wonder what the distribution of such matrices is for some, say, matrix with independent standard Gaussian entries? Nick Higham has the answer, in his compact introduction to random orthonormal matrices. A uniform, rotation-invariant distribution is given by the Haar measure over the group of orthogonal matrices. He also gives the construction for drawing them by random Householder reflections derived from random standard normal vectors.

Iterative normalising

Have a nearly orthonormal matrix? Berg et al. (2018) gives us a contraction which gets you closer to an orthonormal matrix:

\[ \mathbf{Q}^{(k+1)}=\mathbf{Q}^{(k)}\left(\mathbf{I}+\frac{1}{2}\left(\mathbf{I}-\mathbf{Q}^{(k) \top} \mathbf{Q}^{(k)}\right)\right) \] which reputedly converges if \(\left\|\mathbf{Q}^{(0) \top} \mathbf{Q}^{(0)}-\mathbf{I}\right\|_{2}<1.\) They attribute this to Björck and Bowie (1971) and Kovarik (1970), wherein it is derived from the Newton iteration for solving \(Q^{-1}-\) Here the iterations are clearly \(\mathcal{O}(n^2).\) An \(\mathcal{O}(n)\) option would be nice.

Householder reflections

We can apply successive reflections about hyperplanes, the so called Householder reflections, to an identity matrix to construct a new one.

\[ H(\mathbf{z})=\mathbf{z}-\frac{\mathbf{v} \mathbf{v}^{T}}{\|\mathbf{v}\|^{2}} \mathbf{z} \] 🏗

Givens rotation

One obvious method for constructing unitary matrices is composing Givens rotations, which are atomic rotations about 2 axes. I am not sure of the computational complexity of it, but I suspect that all the trigonometry makes it practically expensive. 🏗

Cayley map

the Cayley map maps the skew-symmetric matrices to the orthogonal matrices of positive determinant, and parameterizing skew-symmetric matrices is easy; just take the upper triangular componenet of some matrix and flip /negate it. NB this still requires a matrix inversion in general, AFAICS.

Parametric sub families

Citing MATLAB, Nick Higham gives the following two parametric families of orthonormal matrices. These are clearly far from covering the whole space of orthonormal matrices.

\[ q_{ij} = \displaystyle\frac{2}{\sqrt{2n+1}}\sin \left(\displaystyle\frac{2ij\pi}{2n+1}\right) \]

\[ q_{ij} = \sqrt{\displaystyle\frac{2}{n}}\cos \left(\displaystyle\frac{(i-1/2)(j-1/2)\pi}{n} \right) \]

Another one: the matrix exponential of a skew-symmetric matrix is orthonormal. If \(A=-A^{T}\) then \[ \left(e^{A}\right)^{-1}=\mathrm{e}^{-A}=\mathrm{e}^{A^{T}}=\left(\mathrm{e}^{A}\right)^{T}. \]


Orthogonal convolutions?

Higher rank

Energy preserving maps.


See random rotations.


Anderson, T. W., I. Olkin, and L. G. Underhill. 1987. “Generation of Random Orthogonal Matrices.” SIAM Journal on Scientific and Statistical Computing 8 (4): 625–29. https://doi.org/10.1137/0908055.
Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. “Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–28. ICML’16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.
Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. “Sylvester Normalizing Flows for Variational Inference.” In Uai18. http://arxiv.org/abs/1803.05649.
Björck, Å., and C. Bowie. 1971. “An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix.” SIAM Journal on Numerical Analysis 8 (2): 358–64. https://doi.org/10.1137/0708036.
De Sena, Enzo, Huseyin Haciihabiboglu, Zoran Cvetkovic, and Julius O. Smith. 2015. “Efficient Synthesis of Room Acoustics via Scattering Delay Networks.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (9): 1478–92. https://doi.org/10.1109/TASLP.2015.2438547.
Edelman, Alan, and N. Raj Rao. 2005. “Random Matrix Theory.” Acta Numerica 14 (May): 233–97. https://doi.org/10.1017/S0962492904000236.
Hasenclever, Leonard, Jakub M Tomczak, and Max Welling. 2017. “Variational Inference with Orthogonal Normalizing Flows,” 4.
Hendeković, J. 1974. “On Parametrization of Orthogonal and Unitary Matrices with Respect to Their Use in the Description of Molecules.” Chemical Physics Letters 28 (2): 242–45. https://doi.org/10.1016/0009-2614(74)80063-1.
Jarlskog, C. 2005. “A Recursive Parametrization of Unitary Matrices.” Journal of Mathematical Physics 46 (10): 103508. https://doi.org/10.1063/1.2038607.
Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin Soljačić. 2017. “Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR, 1733–41. http://proceedings.mlr.press/v70/jing17a.html.
Kovarik, Zdislav. 1970. “Some Iterative Methods for Improving Orthonormality.” SIAM Journal on Numerical Analysis 7 (3): 386–89. https://doi.org/10.1137/0707031.
Menzer, Fritz, and Christof Faller. 2010. “Unitary Matrix Design for Diffuse Jot Reverberators.” https://www.researchgate.net/publication/230757792_Unitary_Matrix_Design_for_Diffuse_Jot_Reverberators.
Mezzadri, Francesco. 2007. “How to Generate Random Matrices from the Classical Compact Groups” 54 (5): 13.
Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. “Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR, 2401–9. http://proceedings.mlr.press/v70/mhammedi17a.html.
Regalia, P., and M. Sanjit. 1989. “Kronecker Products, Unitary Matrices and Signal Processing Applications.” SIAM Review 31 (4): 586–613. https://doi.org/10.1137/1031127.
Schroeder, Manfred R. 1961. “Improved Quasi-Stereophony and ‘Colorless’ Artificial Reverberation.” The Journal of the Acoustical Society of America 33 (8): 1061–64. https://doi.org/10.1121/1.1908892.
Schroeder, Manfred R., and B. Logan. 1961. ‘Colorless’ Artificial Reverberation.” Audio, IRE Transactions on AU-9 (6): 209–14. https://doi.org/10.1109/TAU.1961.1166351.
Tilma, Todd, and E C G Sudarshan. 2002. “Generalized Euler Angle Paramterization for SU(N).” Journal of Physics A: Mathematical and General 35 (48): 10467–501. https://doi.org/10.1088/0305-4470/35/48/316.
Valimaki, v., and T. I. Laakso. 2012. “Fractional Delay Filters-Design and Applications.” In Nonuniform Sampling: Theory and Practice, edited by Farokh Marvasti. Springer Science & Business Media.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.