M-estimation

July 11, 2016 — February 17, 2022

likelihood
optimization
statistics

Loosely, estimating a quantity by choosing it to be the extremum of a function, or, if it’s well-behaved enough, a zero of its derivative.

Popular with machine learning, where loss-function-based methods are ubiquitous. In statistics, we see this famously in maximum likelihood estimation, robust estimation, and least squares loss. M-estimation provides a unifying formalism with a convenient large sample asymptotic theory.

🏗 Discuss influence function motivation.

1 Implied density functions

Common loss functions imply a density considered as a maximum likelihood estimation problem.

I assume they did not invent this idea, but Davison and Ortiz (2019) points out that if you have a least-squares-compatible model, usually it can generalize to any elliptical density, which includes Huber losses and many robust ones as special cases.

2 Robust Loss functions

🏗

2.1 Huber loss

2.2 Hampel loss

3 Fitting

Discuss representation (and implementation) in terms of weight functions for least-squares loss.

4 GM-estimators

Mallows, Schweppe etc.

🏗

5 References

Advani, and Ganguli. 2016. An Equivalence Between High Dimensional Bayes Optimal Inference and M-Estimation.” In Advances In Neural Information Processing Systems.
Barndorff-Nielsen. 1983. On a Formula for the Distribution of the Maximum Likelihood Estimator.” Biometrika.
Bühlmann. 2014. Robust Statistics.” In Selected Works of Peter J. Bickel. Selected Works in Probability and Statistics 13.
DasGupta. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics.
Davison, and Ortiz. 2019. FutureMapping 2: Gaussian Belief Propagation for Spatial AI.” arXiv:1910.14139 [Cs].
Donoho, and Montanari. 2013. High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing.” arXiv:1310.7320 [Cs, Math, Stat].
Hampel. 1974. The Influence Curve and Its Role in Robust Estimation.” Journal of the American Statistical Association.
Hampel, Ronchetti, Rousseeuw, et al. 2011. Robust Statistics: The Approach Based on Influence Functions.
Huber. 1964. Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics.
Kandasamy, Krishnamurthy, Poczos, et al. 2014. Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations.” arXiv:1411.4342 [Stat].
Kümmel. 1982. The Impact of Energy on Industrial Growth.” Energy.
Markatou, Marianthi, Karlis, and Ding. 2021. Distance-Based Statistical Inference.” Annual Review of Statistics and Its Application.
Markatou, M., and Ronchetti. 1997. Robust Inference: The Approach Based on Influence Functions.” In Handbook of Statistics. Robust Inference.
Maronna. 1976. Robust M-Estimators of Multivariate Location and Scatter.” The Annals of Statistics.
Mondal, and Percival. 2010. M-Estimation of Wavelet Variance.” Annals of the Institute of Statistical Mathematics.
Ortiz, Evans, and Davison. 2021. A Visual Introduction to Gaussian Belief Propagation.” arXiv:2107.02308 [Cs].
Ronchetti, Elvezio. 1997. Robust Inference by Influence Functions.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part I,.
Ronchetti, E. 2000. Robust Regression Methods and Model Selection.” In Data Segmentation and Model Selection for Computer Vision.
Tharmaratnam, and Claeskens. 2013. A Comparison of Robust Versions of the AIC Based on M-, S- and MM-Estimators.” Statistics.
van de Geer. 2014. Worst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat].
Yang, Gallagher, and McMahan. 2019. A Robust Regression Methodology via M-Estimation.” Communications in Statistics - Theory and Methods.