M-estimation
2016-07-11 — 2022-02-17
Wherein M-estimation is presented as estimation by extremizing loss functions, its ties to loss‑based machine learning and large‑sample asymptotics are noted, and influence functions are sketched.
Loosely, estimating a quantity by choosing it to be the extremum of a function, or, if it’s well-behaved enough, a zero of its derivative.
Popular with machine learning, where loss-function-based methods are ubiquitous. In statistics, we see this famously in maximum likelihood estimation, robust estimation, and least squares loss. M-estimation provides a unifying formalism with a convenient large sample asymptotic theory.
🏗 Discuss influence function motivation.
1 Implied density functions
Common loss functions imply a density considered as a maximum likelihood estimation problem.
I assume they did not invent this idea, but Davison and Ortiz (2019) points out that if you have a least-squares-compatible model, usually it can generalize to any elliptical density, which includes Huber losses and many robust ones as special cases.
2 Robust Loss functions
🏗
2.1 Huber loss
2.2 Hampel loss
3 Fitting
Discuss representation (and implementation) in terms of weight functions for least-squares loss.
4 GM-estimators
Mallows, Schweppe etc.
🏗