Overparameterization

a.k.a. improper learning

General notes on the general technique of increasing the numebr of slack parameters you have, especially in machine learning. Convex relaxations often hinge upon this.

The combination of overparameterization and SGD is argued to be the secret to how deep learning works, by Zeyuan Allen-Zhu, Yuanzhi Li and Zhao Song.

RJ Liption discusses Arno van den Essen’s incidental work on stabilisation methods of polynomials, which relates. AFAICT, to transfer-function-type stability. Does this connect to the overparmeterisation of rational transfer fucntion analysis I so enjoyed?HaMR16

🏗.

Arora, Sanjeev, Nadav Cohen, and Elad Hazan. 2018. “On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization,” February. http://arxiv.org/abs/1802.06509.

Bach, Francis. 2013. “Convex Relaxations of Structured Matrix Factorizations,” September. http://arxiv.org/abs/1309.3117.

Bahmani, Sohail, and Justin Romberg. 2014. “Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation,” December. http://arxiv.org/abs/1501.00046.

———. 2016. “Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation,” October. http://arxiv.org/abs/1610.04210.

Goldstein, Tom, and Christoph Studer. 2016. “PhaseMax: Convex Phase Retrieval via Basis Pursuit,” October. http://arxiv.org/abs/1610.07531.

Hardt, Moritz, Tengyu Ma, and Benjamin Recht. 2016. “Gradient Descent Learns Linear Dynamical Systems,” September. http://arxiv.org/abs/1609.05191.

Hazan, Elad, Karan Singh, and Cyril Zhang. 2017. “Learning Linear Dynamical Systems via Spectral Filtering.” In NIPS. http://arxiv.org/abs/1711.00946.

Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML. http://arxiv.org/abs/1701.05369.

Oliveira, Maurício C. de, and Robert E. Skelton. 2001. “Stability Tests for Constrained Linear Systems.” In Perspectives in Robust Control, 241–57. Lecture Notes in Control and Information Sciences. Springer, London. https://doi.org/10.1007/BFb0110624.

Tropp, J. A. 2006. “Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise.” IEEE Transactions on Information Theory 52 (3): 1030–51. https://doi.org/10.1109/TIT.2005.864420.

Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. “Understanding Deep Learning Requires Rethinking Generalization.” In Proceedings of ICLR. http://arxiv.org/abs/1611.03530.