(Weighted) least squares fits

A classic. Surprisingly deep.

A few non-comprehensive notes to approximating functions from data by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances between two things; The linear algebra of least squares fits seems well-trodden and perenially classic. Used in many many problems. e.g. lasso regression, Gaussian belief propagation.


Nonlinear least squares

Trust region and Levenberg-Marquardt methods in 2nd order optimisation.




jax toolkit JAXopt includes lots of neatg Nonlinear least squares tooling.


The KeOps library lets you compute reductions of large arrays whose entries are given by a mathematical formula or a neural network. It combines efficient C++ routines with an automatic differentiation engine and can be used with Python (NumPy, PyTorch), Matlab and R.

It is perfectly suited to the computation of kernel matrix-vector products, K-nearest neighbors queries, N-body interactions, point cloud convolutions and the associated gradients. Crucially, it performs well even when the corresponding kernel or distance matrices do not fit into the RAM or GPU memory. Compared with a PyTorch GPU baseline, KeOps provides a x10-x100 speed-up on a wide range of geometric applications, from kernel methods to geometric deep learning.



Bagge Carlson, Fredrik. 2018. Machine Learning and System Identification for Estimation in Physical Systems.” Thesis/docmono, Lund University.
Bellec, Pierre C., Guillaume Lecué, and Alexandre B. Tsybakov. 2017. Towards the Study of Least Squares Estimators with Convex Penalty.” arXiv:1701.09120 [Math, Stat], January.
Boyd, Stephen, and Lieven Vandenberghe. 2021. Julia Companion to Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. 1st ed. Cambridge University Press.
Buterin, Vitalik, Zoë Hitzig, and E. Glen Weyl. 2019. A Flexible Design for Funding Public Goods.” Management Science 65 (11): 5171–87.
Charlier, Benjamin, Jean Feydy, Joan Alexis Glaunès, François-David Collin, and Ghislain Durif. 2021. Kernel Operations on the GPU, with Autodiff, Without Memory Overflows.” Journal of Machine Learning Research 22 (74): 1–6.
Chartrand, R., and Wotao Yin. 2008. Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 3869–72.
Chatla, Suneel Babu, and Galit Shmueli. 2016. Modeling Big Count Data: An IRLS Framework for CMP Regression and GAM.” arXiv:1610.08244 [Stat], October.
Chen, Xiaojun, Dongdong Ge, Zizhuo Wang, and Yinyu Ye. 2012. Complexity of Unconstrained L_2-L_p.” Mathematical Programming 143 (1-2): 371–83.
Chen, Yan, and Dean S. Oliver. 2013. Levenberg–Marquardt Forms of the Iterative Ensemble Smoother for Efficient History Matching and Uncertainty Quantification.” Computational Geosciences 17 (4): 689–703.
Flammarion, Nicolas. n.d. “Stochastic Approximation and Least-Squares Regression, with Applications to Machine Learning,” 305.
Flammarion, Nicolas, and Francis Bach. 2017. Stochastic Composite Least-Squares Regression with Convergence Rate O(1/n).” arXiv:1702.06429 [Math, Stat], February.
Friedman, Jerome H. 2002. Stochastic Gradient Boosting.” Computational Statistics & Data Analysis, Nonlinear Methods and Data Mining, 38 (4): 367–78.
Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise Coordinate Optimization.” The Annals of Applied Statistics 1 (2): 302–32.
Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22.
Gasso, G., A. Rakotomamonjy, and S. Canu. 2009. Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing 57 (12): 4686–98.
Huang, Jingwei, Shan Huang, and Mingwei Sun. 2021. DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10303–12. Nashville, TN, USA: IEEE.
Jatavallabhula, Krishna Murthy, Ganesh Iyer, and Liam Paull. 2020. ∇SLAM: Dense SLAM Meets Automatic Differentiation.” In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2130–37. Paris, France: IEEE.
Karampatziakis, Nikos, and John Langford. 2010. Online Importance Weight Aware Updates.” arXiv:1011.1576 [Cs], November.
Leung, Jessica, and Dmytro Matsypura. 2019. Python Language Companion to Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares.
Madsen, K, H.B. Nielsen, and O. Tingleff. 2004. Methods for Non-Linear Least Squares Problems.”
Mahoney, Michael W. 2010. Randomized Algorithms for Matrices and Data. Vol. 3.
Orr, Mark JL. 1996. Introduction to Radial Basis Function Networks.” Technical Report, Center for Cognitive Science, University of Edinburgh.
Portnoy, Stephen, and Roger Koenker. 1997. The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science 12 (4): 279–300.
Rhee, Chang-Han, and Peter W. Glynn. 2015. Unbiased Estimation with Square Root Convergence for SDE Models.” Operations Research 63 (5): 1026–43.
Rosset, Saharon, and Ji Zhu. 2007. Piecewise Linear Regularized Solution Paths.” The Annals of Statistics 35 (3): 1012–30.
Transtrum, Mark K, Benjamin B Machta, and James P Sethna. 2011. The Geometry of Nonlinear Least Squares with Applications to Sloppy Models and Optimization.” Physical Review E 83 (3): 036701.
Yun, Sangwoon, and Kim-Chuan Toh. 2009. A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications 48 (2): 273–307.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.