A classic. Surprisingly deep.
A few non-comprehensive notes to approximating by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances.
As used in many many problems. e.g. lasso regression.
Nonlinear least squares with ceres-solver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems. It is a mature, feature rich, and performant library that has been used in production at Google since 2010.
Boyd and Vandenberghe’s Julia Companion to their Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares is a solid introduction to both linear algebra and Julia, focussing especially on least-squares problems.
Bagge Carlson, Fredrik. 2018. “Machine Learning and System Identification for Estimation in Physical Systems.” PhD Thesis TFRT-1122. Thesis/docmono, Lund University. http://lup.lub.lu.se/record/ffb8dc85-ce12-4f75-8f2b-0881e492f6c0.
Bellec, Pierre C., Guillaume Lecué, and Alexandre B. Tsybakov. 2017. “Towards the Study of Least Squares Estimators with Convex Penalty,” January. http://arxiv.org/abs/1701.09120.
Chartrand, R., and Wotao Yin. 2008. “Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 3869–72. https://doi.org/10.1109/ICASSP.2008.4518498.
Chatla, Suneel Babu, and Galit Shmueli. 2016. “Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM,” October. http://arxiv.org/abs/1610.08244.
Chen, Xiaojun, Dongdong Ge, Zizhuo Wang, and Yinyu Ye. 2012. “Complexity of Unconstrained L_2-L_p.” Mathematical Programming 143 (1-2): 371–83. https://doi.org/10.1007/s10107-012-0613-0.
Flammarion, Nicolas, and Francis Bach. 2017. “Stochastic Composite Least-Squares Regression with Convergence Rate O(1/N),” February. http://arxiv.org/abs/1702.06429.
Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis, Nonlinear Methods and Data Mining, 38 (4): 367–78. https://doi.org/10.1016/S0167-9473(01)00065-2.
Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. “Pathwise Coordinate Optimization.” The Annals of Applied Statistics 1 (2): 302–32. https://doi.org/10.1214/07-AOAS131.
Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.
Gasso, G., A. Rakotomamonjy, and S. Canu. 2009. “Recovering Sparse Signals with a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing 57 (12): 4686–98. https://doi.org/10.1109/TSP.2009.2026004.
Karampatziakis, Nikos, and John Langford. 2010. “Online Importance Weight Aware Updates,” November. http://arxiv.org/abs/1011.1576.
Madsen, K, H. B. Nielsen, and O. Tingleff. 2004. “Methods for Non-Linear Least Squares Problems.” http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf.
Orr, Mark JL. 1996. “Introduction to Radial Basis Function Networks.” Technical Report, Center for Cognitive Science, University of Edinburgh. http://twyu2.synology.me/htdocs/class_2008_1/nn/Slides/Introduction%20to%20Radial%20Basis%20Function%20Networks%20(1996).pdf.
Portnoy, Stephen, and Roger Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300. https://doi.org/10.1214/ss/1030037960.
Rhee, Chang-Han, and Peter W. Glynn. 2015. “Unbiased Estimation with Square Root Convergence for SDE Models.” Operations Research 63 (5): 1026–43. https://doi.org/10.1287/opre.2015.1404.
Rosset, Saharon, and Ji Zhu. 2007. “Piecewise Linear Regularized Solution Paths.” The Annals of Statistics 35 (3): 1012–30. https://doi.org/10.1214/009053606000001370.
Transtrum, Mark K, Benjamin B Machta, and James P Sethna. 2011. “The Geometry of Nonlinear Least Squares with Applications to Sloppy Models and Optimization.” Physical Review E 83 (3): 036701. https://doi.org/10.1103/PhysRevE.83.036701.
Yun, Sangwoon, and Kim-Chuan Toh. 2009. “A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications 48 (2): 273–307. https://doi.org/10.1007/s10589-009-9251-8.