Orthonormal and unitary matrices

Energy preserving operators

In which I think about parameterisations and implementations of finite dimensional energy-preserving operators, a.k.a. matrices. Orthonormal bases viewed from the other side, if you will. A particular nook in the the linear feedback process library.

Uses include maintaining stable gradients in recurrent neural networks (Arjovsky, Shah, and Bengio 2016; Jing et al. 2017; Mhammedi et al. 2017) and efficient normalising flows. (Berg et al. 2018; Hasenclever, Tomczak, and Welling 2017)

Also, parameterising stable Multi-Input-Multi-Output (MIMO) delay networks in signal processing.

There is some terminological work. Some writers refer to orthogonal matrices (but I prefer that to mean matrices where the columns are not all length 1), and some refer to unitary matrices, which seems to imply the matrix is over the complex field instead of the reals but is basically the same from my perspective.


Citing MATLAB, Nick Higham gives the following two parametric orthonormal matrices.

\[ q_{ij} = \displaystyle\frac{2}{\sqrt{2n+1}}\sin \left(\displaystyle\frac{2ij\pi}{2n+1}\right) \]

\[ q_{ij} = \sqrt{\displaystyle\frac{2}{n}}\cos \left(\displaystyle\frac{(i-1/2)(j-1/2)\pi}{n} \right) \]

This is equivalent to choosing a finite orthonormal basis, so any way we can parameterise such a basis gives us an orthonormal matrix.

Another one: the matrix exponential of a skew-symmetric matrices is orthogonal. If \(A=-A^{T}\) then

\[ \left(e^{A}\right)^{-1}=\mathrm{e}^{-A}=\mathrm{e}^{A^{T}}=\left(\mathrm{e}^{A}\right)^{T} \]

Iterative normalising

Have a nearly orthonormal matrix? Berg et al. (2018) gives us a contraction which gets you closer to an orthonormal matrix:

\[ \mathbf{Q}^{(k+1)}=\mathbf{Q}^{(k)}\left(\mathbf{I}+\frac{1}{2}\left(\mathbf{I}-\mathbf{Q}^{(k) \top} \mathbf{Q}^{(k)}\right)\right) \] which reputedly converges if

\(\left\|\mathbf{Q}^{(0) \top} \mathbf{Q}^{(0)}-\mathbf{I}\right\|_{2}<1\)

(attributed to (Björck and Bowie 1971; Kovarik 1970))

Householder reflections

We can apply successive reflections about hyperplanes, the so called Householder reflections, to an identity matrix to construct a new one.

\[ H(\mathbf{z})=\mathbf{z}-\frac{\mathbf{v} \mathbf{v}^{T}}{\|\mathbf{v}\|^{2}} \mathbf{z} \] 🏗

Givens rotation

One obvious method for constructing unitary matrices is composing Givens rotations. 🏗


Nick Higham has a compact introduction to random orthonormal matrices and especially the Haar measure, which is a distribution over such matrices with natural invariances.


Anderson, T. W., I. Olkin, and L. G. Underhill. 1987. “Generation of Random Orthogonal Matrices.” SIAM Journal on Scientific and Statistical Computing 8 (4): 625–29. https://doi.org/10.1137/0908055.
Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. “Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–28. ICML’16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.
Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. “Sylvester Normalizing Flows for Variational Inference.” In Uai18. http://arxiv.org/abs/1803.05649.
Björck, Å., and C. Bowie. 1971. “An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix.” SIAM Journal on Numerical Analysis 8 (2): 358–64. https://doi.org/10.1137/0708036.
De Sena, Enzo, Huseyin Haciihabiboglu, Zoran Cvetkovic, and Julius O. Smith. 2015. “Efficient Synthesis of Room Acoustics via Scattering Delay Networks.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (9): 1478–92. https://doi.org/10.1109/TASLP.2015.2438547.
Edelman, Alan, and N. Raj Rao. 2005. “Random Matrix Theory.” Acta Numerica 14 (May): 233–97. https://doi.org/10.1017/S0962492904000236.
Hasenclever, Leonard, Jakub M Tomczak, and Max Welling. 2017. “Variational Inference with Orthogonal Normalizing Flows,” 4.
Hendeković, J. 1974. “On Parametrization of Orthogonal and Unitary Matrices with Respect to Their Use in the Description of Molecules.” Chemical Physics Letters 28 (2): 242–45. https://doi.org/10.1016/0009-2614(74)80063-1.
Jarlskog, C. 2005. “A Recursive Parametrization of Unitary Matrices.” Journal of Mathematical Physics 46 (10): 103508. https://doi.org/10.1063/1.2038607.
Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin Soljačić. 2017. “Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR, 1733–41. http://proceedings.mlr.press/v70/jing17a.html.
Kovarik, Zdislav. 1970. “Some Iterative Methods for Improving Orthonormality.” SIAM Journal on Numerical Analysis 7 (3): 386–89. https://doi.org/10.1137/0707031.
Menzer, Fritz, and Christof Faller. 2010. “Unitary Matrix Design for Diffuse Jot Reverberators.” https://www.researchgate.net/publication/230757792_Unitary_Matrix_Design_for_Diffuse_Jot_Reverberators.
Mezzadri, Francesco. 2007. “How to Generate Random Matrices from the Classical Compact Groups 54 (5): 13.
Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. “Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR, 2401–9. http://proceedings.mlr.press/v70/mhammedi17a.html.
Regalia, P., and M. Sanjit. 1989. “Kronecker Products, Unitary Matrices and Signal Processing Applications.” SIAM Review 31 (4): 586–613. https://doi.org/10.1137/1031127.
Schroeder, Manfred R. 1961. “Improved Quasi-Stereophony and Colorless Artificial Reverberation.” The Journal of the Acoustical Society of America 33 (8): 1061–64. https://doi.org/10.1121/1.1908892.
Schroeder, Manfred R., and B. Logan. 1961. “"Colorless" Artificial Reverberation.” Audio, IRE Transactions on AU-9 (6): 209–14. https://doi.org/10.1109/TAU.1961.1166351.
Tilma, Todd, and E C G Sudarshan. 2002. “Generalized Euler Angle Paramterization for SU(N).” Journal of Physics A: Mathematical and General 35 (48): 10467–501. https://doi.org/10.1088/0305-4470/35/48/316.
Valimaki, v., and T. I. Laakso. 2012. “Fractional Delay Filters-Design and Applications.” In Nonuniform Sampling: Theory and Practice, edited by Farokh Marvasti. Springer Science & Business Media. http://books.google.com?id=n3fgBwAAQBAJ.