M-estimation


Loosely, estimating a quantity by choosing it to be the extremum of a function, or, if it’s well-behaved enough, a zero of its derivative.

Popular with machine learning, where loss-function based methods are ubiquitous. In statistics we see this famously in maximum likelihood estimation and robust estimation, and least squares loss, for which M-estimation provides a unifying formalism with a convenient large sample asymptotic theory.

🏗 Discuss influence function motivation.

Implied density functions

Common loss function imply a density considered as a maximum_likelihood estimation problem.

Robust Loss functions

🏗

Huber loss

Hampel loss

Fitting

Discuss representation (and implementation) in terms of weight functions for least-squares loss.

GM-estimators

Mallows, Schweppe etc.

🏗

Advani, Madhu, and Surya Ganguli. 2016. “An Equivalence Between High Dimensional Bayes Optimal Inference and M-Estimation.” In Advances in Neural Information Processing Systems. http://arxiv.org/abs/1609.07060.

Barndorff-Nielsen, O. 1983. “On a Formula for the Distribution of the Maximum Likelihood Estimator.” Biometrika 70 (2): 343–65. https://doi.org/10.1093/biomet/70.2.343.

Bühlmann, Peter. 2014. “Robust Statistics.” In Selected Works of Peter J. Bickel, edited by Jianqing Fan, Ya’acov Ritov, and C. F. Jeff Wu, 51–98. Selected Works in Probability and Statistics 13. Springer New York. http://link.springer.com/chapter/10.1007/978-1-4614-5544-8_2.

DasGupta, Anirban. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer New York. http://link.springer.com/10.1007/978-0-387-75971-5.

Donoho, David L., and Andrea Montanari. 2013. “High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing,” October. http://arxiv.org/abs/1310.7320.

Geer, Sara van de. 2014. “Worst Possible Sub-Directions in High-Dimensional Models.” In. Vol. 131. http://arxiv.org/abs/1403.7023.

Hampel, Frank R. 1974. “The Influence Curve and Its Role in Robust Estimation.” Journal of the American Statistical Association 69 (346): 383–93. https://doi.org/10.1080/01621459.1974.10482962.

Hampel, Frank R., Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel. 2011. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons. http://books.google.com?id=XK3uhrVefXQC.

Huber, Peter J. 1964. “Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics 35 (1): 73–101. https://doi.org/10.1214/aoms/1177703732.

Kandasamy, Kirthevasan, Akshay Krishnamurthy, Barnabas Poczos, Larry Wasserman, and James M. Robins. 2014. “Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations,” November. http://arxiv.org/abs/1411.4342.

Kümmel, Reiner. 1982. “The Impact of Energy on Industrial Growth.” Energy 7 (2): 189–203. https://doi.org/10.1016/0360-5442(82)90044-5.

Markatou, M., and E. Ronchetti. 1997. “3 Robust Inference: The Approach Based on Influence Functions.” In Handbook of Statistics, edited by BT - Handbook of Statistics, 15:49–75. Robust Inference. Elsevier. https://doi.org/10.1016/S0169-7161(97)15005-2.

Maronna, Ricardo Antonio. 1976. “Robust M-Estimators of Multivariate Location and Scatter.” The Annals of Statistics 4 (1): 51–67. http://ssg.mit.edu/group/ajkim/area_exam/papers/Maronna_1976.pdf.gz.

Mondal, Debashis, and Donald B. Percival. 2010. “M-Estimation of Wavelet Variance.” Annals of the Institute of Statistical Mathematics 64 (1): 27–53. https://doi.org/10.1007/s10463-010-0282-9.

Ronchetti, E. 2000. “Robust Regression Methods and Model Selection.” In Data Segmentation and Model Selection for Computer Vision, edited by Alireza Bab-Hadiashar and David Suter, 31–40. Springer New York. https://doi.org/10.1007/978-0-387-21528-0_2.

Ronchetti, Elvezio. 1997. “Robust Inference by Influence Functions.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part I, 57 (1): 59–72. https://doi.org/10.1016/S0378-3758(96)00036-5.

Tharmaratnam, Kukatharmini, and Gerda Claeskens. 2013. “A Comparison of Robust Versions of the AIC Based on M-, S- and MM-Estimators.” Statistics 47 (1): 216–35. https://doi.org/10.1080/02331888.2011.568120.