Fun with determinants

Especially Jacobian determinants

2011-04-06 — 2021-10-12

Wherein standard determinant identities are recited, the determinant-as-product-of-eigenvalues and scaling laws are stated, expansions of det(I+ A) for n=2,3,4 and a small-ε trace‑based approximation are presented.

algebra

functional analysis

Gaussian

Hilbert space

linear algebra

Petersen and Pedersen (2012) note the standard identities:

Let \(\mathbf{A}\) be an \(n \times n\) matrix. \[ \begin{aligned} \operatorname{det}(\mathbf{A}) &=\prod_{i} \lambda_{i} \quad \lambda_{i}=\operatorname{eig}(\mathbf{A}) \\ \operatorname{det}(c \mathbf{A}) &=c^{n} \operatorname{det}(\mathbf{A}), \quad \text { if } \mathbf{A} \in \mathbb{R}^{n \times n} \\ \operatorname{det}\left(\mathbf{A}^{T}\right) &=\operatorname{det}(\mathbf{A}) \\ \operatorname{det}(\mathbf{A} \mathbf{B}) &=\operatorname{det}(\mathbf{A}) \operatorname{det}(\mathbf{B}) \\ \operatorname{det}\left(\mathbf{A}^{-1}\right) &=1 / \operatorname{det}(\mathbf{A}) \\ \operatorname{det}\left(\mathbf{A}^{n}\right) &=\operatorname{det}(\mathbf{A})^{n} \\ \operatorname{det}\left(\mathbf{I}+\mathbf{u v}^{T}\right) &=1+\mathbf{u}^{T} \mathbf{v} \end{aligned} \] For \(n=2\) : \[ \operatorname{det}(\mathbf{I}+\mathbf{A})=1+\operatorname{det}(\mathbf{A})+\operatorname{Tr}(\mathbf{A}) \] For \(n=3\) : \[ \operatorname{det}(\mathbf{I}+\mathbf{A})=1+\operatorname{det}(\mathbf{A})+\operatorname{Tr}(\mathbf{A})+\frac{1}{2} \operatorname{Tr}(\mathbf{A})^{2}-\frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^{2}\right) \] For \(n=4\) : \[ \begin{aligned} \operatorname{det}(\mathbf{I}+\mathbf{A})=& 1+\operatorname{det}(\mathbf{A})+\operatorname{Tr}(\mathbf{A})+\frac{1}{2} \\ &+\operatorname{Tr}(\mathbf{A})^{2}-\frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^{2}\right) \\ &+\frac{1}{6} \operatorname{Tr}(\mathbf{A})^{3}-\frac{1}{2} \operatorname{Tr}(\mathbf{A}) \operatorname{Tr}\left(\mathbf{A}^{2}\right)+\frac{1}{3} \operatorname{Tr}\left(\mathbf{A}^{3}\right) \end{aligned} \] For small \(\varepsilon\), the following approximation holds: \[ \operatorname{det}(\mathbf{I}+\varepsilon \mathbf{A}) \cong 1+\operatorname{det}(\mathbf{A})+\varepsilon \operatorname{Tr}(\mathbf{A})+\frac{1}{2} \varepsilon^{2} \operatorname{Tr}(\mathbf{A})^{2}-\frac{1}{2} \varepsilon^{2} \operatorname{Tr}\left(\mathbf{A}^{2}\right) \]

For a block matrix we have: For \(n=4\) : \[ \begin{aligned} \operatorname{det}(\mathbf{I}+\mathbf{A})=& 1+\operatorname{det}(\mathbf{A})+\operatorname{Tr}(\mathbf{A})+\frac{1}{2} \\ &+\operatorname{Tr}(\mathbf{A})^{2}-\frac{1}{2} \operatorname{Tr}\left(\mathbf{A}^{2}\right) \\ &+\frac{1}{6} \operatorname{Tr}(\mathbf{A})^{3}-\frac{1}{2} \operatorname{Tr}(\mathbf{A}) \operatorname{Tr}\left(\mathbf{A}^{2}\right)+\frac{1}{3} \operatorname{Tr}\left(\mathbf{A}^{3}\right) \end{aligned} \] For small \(\varepsilon\), the following approximation holds: \[ \operatorname{det}(\mathbf{I}+\varepsilon \mathbf{A}) \cong 1+\operatorname{det}(\mathbf{A})+\varepsilon \operatorname{Tr}(\mathbf{A})+\frac{1}{2} \varepsilon^{2} \operatorname{Tr}(\mathbf{A})^{2}-\frac{1}{2} \varepsilon^{2} \operatorname{Tr}\left(\mathbf{A}^{2}\right) \]

1 References

Axler. 1995. “Down with Determinants!” The American Mathematical Monthly.

———. 2014. Linear Algebra Done Right.

Figurnov, Mohamed, and Mnih. 2018. “Implicit Reparameterization Gradients.” In Advances in Neural Information Processing Systems 31.

Grathwohl, Chen, Bettencourt, et al. 2018. “FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models.” arXiv:1810.01367 [Cs, Stat].

Huang, Krueger, Lacoste, et al. 2018. “Neural Autoregressive Flows.” arXiv:1804.00779 [Cs, Stat].

Jankowiak, and Obermeyer. 2018. “Pathwise Derivatives Beyond the Reparameterization Trick.” In International Conference on Machine Learning.

Kingma, Salimans, Jozefowicz, et al. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29.

Kingma, Salimans, and Welling. 2015. “Variational Dropout and the Local Reparameterization Trick.” In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. NIPS’15.

Kingma, and Welling. 2014. “Auto-Encoding Variational Bayes.” In ICLR 2014 Conference.

Louizos, and Welling. 2017. “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR.

Massaroli, Poli, Bin, et al. 2020. “Stable Neural Flows.” arXiv:2003.08063 [Cs, Math, Stat].

Minka. 2000. Old and new matrix algebra useful for statistics.

Papamakarios, Murray, and Pavlakou. 2017. “Masked Autoregressive Flow for Density Estimation.” In Advances in Neural Information Processing Systems 30.

Papamakarios, Nalisnick, Rezende, et al. 2021. “Normalizing Flows for Probabilistic Modeling and Inference.” Journal of Machine Learning Research.

Petersen, and Pedersen. 2012. “The Matrix Cookbook.”

Pfau, and Rezende. 2020. “Integrable Nonparametric Flows.” In.

Ruiz, Titsias, and Blei. 2016. “The Generalized Reparameterization Gradient.” In Advances In Neural Information Processing Systems.

Seber. 2007. A Matrix Handbook for Statisticians.

Spantini, Bigoni, and Marzouk. 2017. “Inference via Low-Dimensional Couplings.” Journal of Machine Learning Research.

van den Berg, Hasenclever, Tomczak, et al. 2018. “Sylvester Normalizing Flows for Variational Inference.” In UAI18.