(Weighted) least squares fits

A classic. Surprisingly deep.

A few non-comprehensive notes to approximating by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances.

As used in many many problems. e.g. lasso regression.

Bagge Carlson, Fredrik. 2018. “Machine Learning and System Identification for Estimation in Physical Systems.” PhD Thesis TFRT-1122. Thesis/docmono, Lund University. http://lup.lub.lu.se/record/ffb8dc85-ce12-4f75-8f2b-0881e492f6c0.

Bellec, Pierre C., Guillaume Lecué, and Alexandre B. Tsybakov. 2017. “Towards the Study of Least Squares Estimators with Convex Penalty,” January. http://arxiv.org/abs/1701.09120.

Chartrand, R., and Wotao Yin. 2008. “Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 3869–72. https://doi.org/10.1109/ICASSP.2008.4518498.

Chatla, Suneel Babu, and Galit Shmueli. 2016. “Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM,” October. http://arxiv.org/abs/1610.08244.

Chen, Xiaojun, Dongdong Ge, Zizhuo Wang, and Yinyu Ye. 2012. “Complexity of Unconstrained L_2-L_p.” Mathematical Programming 143 (1-2): 371–83. https://doi.org/10.1007/s10107-012-0613-0.

Flammarion, Nicolas, and Francis Bach. 2017. “Stochastic Composite Least-Squares Regression with Convergence Rate O(1/N),” February. http://arxiv.org/abs/1702.06429.

Friedman, Jerome H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis, Nonlinear Methods and Data Mining, 38 (4): 367–78. https://doi.org/10.1016/S0167-9473(01)00065-2.

Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. “Pathwise Coordinate Optimization.” The Annals of Applied Statistics 1 (2): 302–32. https://doi.org/10.1214/07-AOAS131.

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.

Gasso, G., A. Rakotomamonjy, and S. Canu. 2009. “Recovering Sparse Signals with a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing 57 (12): 4686–98. https://doi.org/10.1109/TSP.2009.2026004.

Karampatziakis, Nikos, and John Langford. 2010. “Online Importance Weight Aware Updates,” November. http://arxiv.org/abs/1011.1576.

Madsen, K, H. B. Nielsen, and O. Tingleff. 2004. “Methods for Non-Linear Least Squares Problems.” http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf.

Orr, Mark JL. 1996. “Introduction to Radial Basis Function Networks.” Technical Report, Center for Cognitive Science, University of Edinburgh. http://twyu2.synology.me/htdocs/class_2008_1/nn/Slides/Introduction%20to%20Radial%20Basis%20Function%20Networks%20(1996).pdf.

Portnoy, Stephen, and Roger Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300. https://doi.org/10.1214/ss/1030037960.

Rhee, Chang-Han, and Peter W. Glynn. 2015. “Unbiased Estimation with Square Root Convergence for SDE Models.” Operations Research 63 (5): 1026–43. https://doi.org/10.1287/opre.2015.1404.

Rosset, Saharon, and Ji Zhu. 2007. “Piecewise Linear Regularized Solution Paths.” The Annals of Statistics 35 (3): 1012–30. https://doi.org/10.1214/009053606000001370.

Transtrum, Mark K, Benjamin B Machta, and James P Sethna. 2011. “The Geometry of Nonlinear Least Squares with Applications to Sloppy Models and Optimization.” Physical Review E 83 (3): 036701. https://doi.org/10.1103/PhysRevE.83.036701.

Yun, Sangwoon, and Kim-Chuan Toh. 2009. “A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications 48 (2): 273–307. https://doi.org/10.1007/s10589-009-9251-8.