A classic. Surprisingly deep. Somewhat arbitrary.
A few non-comprehensive notes to approximating by the expedient of minimising the sum of the squares of the deviances.
As used in many many problems. e.g. lasso regression.
Nonlinear least squares with ceres-solver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems. It is a mature, feature rich, and performant library that has been used in production at Google since 2010.
Boyd and Vandenberghe’s Julia Companion to their Introduction to AppliedLinear Algebra: Vectors, Matrices, and Least Squares is a solid introduction to both linear algebra and Julia, focussing especially on least-squares problems.
- Minimal python Iteratively reweighted least squares by A.E. Haynes
- Ricardo Carvalho, Adaptive Lasso: What it is and how to implement in R
Bellec, Pierre C., Guillaume Lecué, and Alexandre B. Tsybakov. 2017. “Towards the Study of Least Squares Estimators with Convex Penalty,” January. http://arxiv.org/abs/1701.09120.
Chartrand, R., and Wotao Yin. 2008. “Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 3869–72. https://doi.org/10.1109/ICASSP.2008.4518498.
Chatla, Suneel Babu, and Galit Shmueli. 2016. “Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM,” October. http://arxiv.org/abs/1610.08244.
Chen, Xiaojun, Dongdong Ge, Zizhuo Wang, and Yinyu Ye. 2012. “Complexity of Unconstrained L_2-L_p.” Mathematical Programming 143 (1-2): 371–83. https://doi.org/10.1007/s10107-012-0613-0.
Flammarion, Nicolas, and Francis Bach. 2017. “Stochastic Composite Least-Squares Regression with Convergence Rate O(1/N),” February. http://arxiv.org/abs/1702.06429.
Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis, Nonlinear Methods and Data Mining, 38 (4): 367–78. https://doi.org/10.1016/S0167-9473(01)00065-2.
Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. “Pathwise Coordinate Optimization.” The Annals of Applied Statistics 1 (2): 302–32. https://doi.org/10.1214/07-AOAS131.
Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.
Gasso, G., A. Rakotomamonjy, and S. Canu. 2009. “Recovering Sparse Signals with a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing 57 (12): 4686–98. https://doi.org/10.1109/TSP.2009.2026004.
Karampatziakis, Nikos, and John Langford. 2010. “Online Importance Weight Aware Updates,” November. http://arxiv.org/abs/1011.1576.
Madsen, K, H.B. Nielsen, and O. Tingleff. 2004. “Methods for Non-Linear Least Squares Problems.” http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf,
Orr, Mark JL. 1996. “Introduction to Radial Basis Function Networks.” Technical Report, Center for Cognitive Science, University of Edinburgh. http://twyu2.synology.me/htdocs/class_2008_1/nn/Slides/Introduction%20to%20Radial%20Basis%20Function%20Networks%20(1996).pdf.
Portnoy, Stephen, and Roger Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300. https://doi.org/10.1214/ss/1030037960.
Rhee, Chang-Han, and Peter W. Glynn. 2015. “Unbiased Estimation with Square Root Convergence for SDE Models.” Operations Research 63 (5): 1026–43. https://doi.org/10.1287/opre.2015.1404.
Rosset, Saharon, and Ji Zhu. 2007. “Piecewise Linear Regularized Solution Paths.” The Annals of Statistics 35 (3): 1012–30. https://doi.org/10.1214/009053606000001370.
Yun, Sangwoon, and Kim-Chuan Toh. 2009. “A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications 48 (2): 273–307. https://doi.org/10.1007/s10589-009-9251-8.