# (Outlier) robust statistics

November 25, 2014 — January 21, 2022

Terminology note: I mean robust statistics in the sense of Huber, which is, informally, *outlier* robustness.

There are also *robust* estimators in econometrics; then it means something about good behaviour under heteroskedastic and/or correlated error. Robust *Bayes* means something about inference that is robust to the choice of prior (which could overlap but is a rather different emphasis).

Outlier robustness is AFAICT more-or-less a frequentist project. Bayesian approaches seem to achieve robustness largely by choosing heavy-tailed priors or heavy-tailed noise distributions where they might have chosen light-tailed ones, e.g. Laplacian distributions instead of Gaussian ones. Such heavy-tailed distributions may have arbitrary prior parameters, but not *more arbitrary than usual* in Bayesian statistics and therefore do not attract so much need to wash away the guilt as frequentists seem to feel.

One can off course use heavy-tailed noise distributions in frequentist inference as well and that will buy a kind of robustness. That seems to be unpopular due to making frequentist inference as difficult as Bayesian inference.

## 1 Corruption models

- Random (mixture) corruption
- (Adversarial) total variation \(\epsilon\)-corruption.
- wasserstein corruption models (does one usually assume adversarial here or random) as seen in “distributionally robust” models.
- other?

## 2 M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting a maximum of the likelihood function as you do in maximum likelihood, or a minimum of the sum of squared residuals, as you do in least-squares estimation, you minimise a specifically chosen loss function for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (1964).

See M-estimation for some details.

AFAICT, the definition of M-estimation includes the possibility that you *could* in principle select a *less*-robust loss function than least sum-of-squares but I have not seen this in the literature. Generally, some robustified approach is presumed, which penalises outliers less severly than least-squares.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians.

### 2.1 Huber loss

### 2.2 Tukey loss

## 3 MM-estimation

🏗 Don’t know

## 4 Median-based estimators

Rousseeuw and Yohai’s idea (P. Rousseeuw and Yohai 1984)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (P. J. Rousseeuw 1984) More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? 🏗

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. 🏗

Tukey median, and why no-one uses it what with it being NP-Hard.

## 5 Others

RANSAC — some kind of randomised outlier detection estimator. 🏗

## 6 Incoming

- relation to penalized regression.
- connection with Lasso.
- Beran’s Hellinger-ball contamination model, which I also don’t yet understand.
- Breakdown point explanation
- Yet Another Math Programming Consultant: Huber regression: different formulations

## 7 References

*Biometrika*.

*Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete*.

*The Annals of Statistics*.

*Journal of the American Statistical Association*.

*Biometrics*.

*Selected Works of Peter J. Bickel*. Selected Works in Probability and Statistics 13.

*Biometrika*.

*Journal of the American Statistical Association*.

*arXiv:1611.02315 [Cs, Math, Stat]*.

*Biometrika*.

*Biometrika*.

*arXiv:1910.14139 [Cs]*.

*arXiv:1604.06443 [Cs, Math, Stat]*.

*arXiv:1703.00893 [Cs, Math, Stat]*.

*A Festschrift for Erich L. Lehmann*.

*The Annals of Statistics*.

*arXiv:1310.7320 [Cs, Math, Stat]*.

*arXiv:1610.03425 [Stat]*.

*Journal of the American Statistical Association*.

*arXiv:1611.05224 [Math, Stat]*.

*The Annals of Statistics*.

*Journal of the American Statistical Association*.

*Robust Statistics: The Approach Based on Influence Functions*.

*Communications in Statistics - Theory and Methods*.

*Journal of Computational and Graphical Statistics*.

*The Annals of Mathematical Statistics*.

*Robust Statistics*. Wiley Series in Probability and Statistics.

*arXiv:1610.01353 [Math, Stat]*.

*Biometrika*.

*Journal of Statistical Planning and Inference*, C.R. Rao 80th Birthday Felicitation Volume, Part IV,.

*Information Criteria and Statistical Modeling*. Springer Series in Statistics.

*Proceedings of the National Academy of Sciences*.

*arXiv:1702.05860 [Cs]*.

*Biometrika*.

*Econometric Theory*.

*IEEE Transactions on Signal Processing*.

*Annual Review of Statistics and Its Application*.

*Handbook of Statistics*. Robust Inference.

*The Annals of Statistics*.

*Robust statistics: theory and methods*. Wiley series in probability and statistics.

*Journal of the American Statistical Association*.

*Wiley StatsRef: Statistics Reference Online*.

*Technometrics*.

*Analytica Chimica Acta*.

*arXiv:1311.4115 [Cs, Math]*.

*The Annals of Applied Probability*.

*Statistics & Probability Letters*.

*arXiv:2107.02308 [Cs]*.

*Journal of Statistical Planning and Inference*.

*Statistics & Probability Letters*.

*Journal of Statistical Planning and Inference*, Robust Statistics and Data Analysis, Part I,.

*Data Segmentation and Model Selection for Computer Vision*.

*Journal of Econometrics*.

*Journal of the American Statistical Association*.

*Robust Regression and Outlier Detection*. Wiley Series in Probability and Mathematical Statistics.

*Robust and Nonlinear Time Series Analysis*. Lecture Notes in Statistics 26.

*International Statistical Review / Revue Internationale de Statistique*.

*The American Statistician*.

*The American Statistician*.

*Statistics*.

*Henri Theil’s Contributions to Economics and Econometrics*. Advanced Studies in Theoretical and Applied Econometrics 23.

*Journal of Statistical Planning and Inference*.

*Biometrika*.

*IEEE Transactions on Information Theory*.

*Communications in Statistics - Theory and Methods*.

*ICML (3)*.

*Image and Vision Computing*.