(Outlier) robust statistics

There are also robust estimators in econometrics; then it means something about good behaviour under heteroskedastic and/or correlated error. Robust Bayes means something about inference that is robust to the choice of prior (which could overlap but is a rather different emphasis).

Outlier robustness is AFAICT more-or-less a frequentist project. Bayesian approaches seem to achieve robustness largely by choosing heavy-tailed priors or heavy-tailed noise distributions where they might have chosen light-tailed ones, e.g. Laplacian distributions instead of Gaussian ones. Such heavy-tailed distributions may have arbitrary prior parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to wash away the guilt as frequentists seem to feel.

One can off course use heavy-tailed noise distributions in frequentist inference as well and that will buy a kind of robustness. That seems to be unpopular due to making frequentist inference as difficult as Bayesian inference.


  • relation to penalized regression.
  • connection with Lasso.
  • Beran’s Hellinger-ball contamination model, which I also don’t yet understand.
  • Breakdown point explanation

Corruption models

  • Random (mixture) corruption
  • (Adversarial) total variation \(\epsilon\)-corruption.
  • wasserstein corruption models (does one usually assume adversarial here or random) as seen in “distributionally robust” models.
  • other?

M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or a minimum of the sum of squared residuals, as you do in least-squares estimation, you minimise a specifically chosen loss function for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (1964).

See M-estimation for some details.

AFAICT, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, however.


🏗 Don’t know

Median-based estimators

Rousseeuw and Yohai’s idea. (P. Rousseeuw and Yohai 1984)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (P. J. Rousseeuw 1984) ] More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? 🏗

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. 🏗

Tukey median, and why no-one uses it what with it being NP-Hard.


RANSAC — some kind of randomised outlier detection estimator. 🏗

Barndorff-Nielsen, O. 1983. “On a Formula for the Distribution of the Maximum Likelihood Estimator.” Biometrika 70 (2): 343–65. https://doi.org/10.1093/biomet/70.2.343.

Beran, Rudolf. 1981. “Efficient Robust Estimates in Parametric Models.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 55 (1): 91–108. https://doi.org/10.1007/BF01013463.

———. 1982. “Robust Estimation in Models for Independent Non-Identically Distributed Data.” The Annals of Statistics 10 (2): 415–28. https://doi.org/10.1214/aos/1176345783.

Bickel, P. J. 1975. “One-Step Huber Estimates in the Linear Model.” Journal of the American Statistical Association 70 (350): 428–34. https://doi.org/10.1080/01621459.1975.10479884.

Bondell, Howard D., Arun Krishna, and Sujit K. Ghosh. 2010. “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics 66 (4): 1069–77. https://doi.org/10.1111/j.1541-0420.2010.01391.x.

Burman, P., and D. Nolan. 1995. “A General Akaike-Type Criterion for Model Selection in Robust Regression.” Biometrika 82 (4): 877–86. https://doi.org/10.1093/biomet/82.4.877.

Bühlmann, Peter. 2014. “Robust Statistics.” In Selected Works of Peter J. Bickel, edited by Jianqing Fan, Ya’acov Ritov, and C. F. Jeff Wu, 51–98. Selected Works in Probability and Statistics 13. Springer New York. http://link.springer.com/chapter/10.1007/978-1-4614-5544-8_2.

Cantoni, Eva, and Elvezio Ronchetti. 2001. “Robust Inference for Generalized Linear Models.” Journal of the American Statistical Association 96 (455): 1022–30. https://doi.org/10.1198/016214501753209004.

Charikar, Moses, Jacob Steinhardt, and Gregory Valiant. 2016. “Learning from Untrusted Data,” November. http://arxiv.org/abs/1611.02315.

Cox, D. R. 1983. “Some Remarks on Overdispersion.” Biometrika 70 (1): 269–74. https://doi.org/10.1093/biomet/70.1.269.

Czellar, Veronika, and Elvezio Ronchetti. 2010. “Accurate and Robust Tests for Indirect Inference.” Biometrika 97 (3): 621–30. https://doi.org/10.1093/biomet/asq040.

Diakonikolas, Ilias, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2016. “Robust Estimators in High Dimensions Without the Computational Intractability,” April. http://arxiv.org/abs/1604.06443.

Diakonikolas, Ilias, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2017. “Being Robust (in High Dimensions) Can Be Practical,” March. http://arxiv.org/abs/1703.00893.

Donoho, David L., and Peter J. Huber. 1983. “The Notion of Breakdown Point.” A Festschrift for Erich L. Lehmann 157184. https://books.google.ch/books?hl=en&lr=&id=H8QdaAPW3c8C&oi=fnd&pg=PA157&dq=donoho+huber+1983+breakdown+point&ots=I38CG8Bt_-&sig=BKoPX6T8T3r_qwPzmCjWY96PsKI.

Donoho, David L., and Richard C. Liu. 1988. “The "Automatic" Robustness of Minimum Distance Functionals.” The Annals of Statistics 16 (2): 552–86. https://doi.org/10.1214/aos/1176350820.

Donoho, David L., and Andrea Montanari. 2013. “High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing,” October. http://arxiv.org/abs/1310.7320.

Duchi, John, Peter Glynn, and Hongseok Namkoong. 2016. “Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach,” October. http://arxiv.org/abs/1610.03425.

Genton, Marc G, and Elvezio Ronchetti. 2003. “Robust Indirect Inference.” Journal of the American Statistical Association 98 (461): 67–76. https://doi.org/10.1198/016214503388619102.

Ghosh, Abhik, and Ayanendranath Basu. 2016. “General Model Adequacy Tests and Robust Statistical Inference Based on A New Family of Divergences,” November. http://arxiv.org/abs/1611.05224.

Golubev, Grigori K., and Michael Nussbaum. 1990. “A Risk Bound in Sobolev Class Regression.” The Annals of Statistics 18 (2): 758–78. https://doi.org/10.1214/aos/1176347624.

Hampel, Frank R. 1974. “The Influence Curve and Its Role in Robust Estimation.” Journal of the American Statistical Association 69 (346): 383–93. https://doi.org/10.1080/01621459.1974.10482962.

Hampel, Frank R., Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel. 2011. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons. http://books.google.com?id=XK3uhrVefXQC.

Holland, Paul W., and Roy E. Welsch. 1977. “Robust Regression Using Iteratively Reweighted Least-Squares.” Communications in Statistics - Theory and Methods 6 (9): 813–27. https://doi.org/10.1080/03610927708827533.

Huber, Peter J. 2009. Robust Statistics. 2nd ed. Wiley Series in Probability and Statistics. Hoboken, N.J: Wiley.

———. 1964. “Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics 35 (1): 73–101. https://doi.org/10.1214/aoms/1177703732.

Janková, Jana, and Sara van de Geer. 2016. “Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity,” October. http://arxiv.org/abs/1610.01353.

Konishi, Sadanori, and G. Kitagawa. 2008. Information Criteria and Statistical Modeling. Springer Series in Statistics. New York: Springer.

Konishi, Sadanori, and Genshiro Kitagawa. 1996. “Generalised Information Criteria in Model Selection.” Biometrika 83 (4): 875–90. https://doi.org/10.1093/biomet/83.4.875.

———. 2003. “Asymptotic Theory for Information Criteria in Model Selection—Functional Approach.” Journal of Statistical Planning and Inference, C.R. Rao 80th Birthday Felicitation vol., Part IV, 114 (1–2): 45–61. https://doi.org/10.1016/S0378-3758(02)00462-7.

Krzakala, Florent, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, and Pan Zhang. 2013. “Spectral Redemption in Clustering Sparse Networks.” Proceedings of the National Academy of Sciences 110 (52): 20935–40. https://doi.org/10.1073/pnas.1312486110.

Li, Jerry. 2017. “Robust Sparse Estimation Tasks in High Dimensions,” February. http://arxiv.org/abs/1702.05860.

LU, W., Y. GOLDBERG, and J. P. FINE. 2012. “On the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika 99 (3): 717–31. https://doi.org/10.1093/biomet/ass027.

Machado, José A. F. 1993. “Robust Model Selection and M-Estimation.” Econometric Theory 9 (03): 478–93. https://doi.org/10.1017/S0266466600007775.

Manton, J. H., V. Krishnamurthy, and H. V. Poor. 1998. “James-Stein State Filtering Algorithms.” IEEE Transactions on Signal Processing 46 (9): 2431–47. https://doi.org/10.1109/78.709532.

Markatou, M., and E. Ronchetti. 1997. “3 Robust Inference: The Approach Based on Influence Functions.” In Handbook of Statistics, edited by BT - Handbook of Statistics, 15:49–75. Robust Inference. Elsevier. https://doi.org/10.1016/S0169-7161(97)15005-2.

Maronna, Ricardo A., Douglas Martin, and Víctor J. Yohai. 2006. Robust Statistics: Theory and Methods. Reprinted with corr. Wiley Series in Probability and Statistics. Chichester: Wiley.

Maronna, Ricardo Antonio. 1976. “Robust M-Estimators of Multivariate Location and Scatter.” The Annals of Statistics 4 (1): 51–67. http://ssg.mit.edu/group/ajkim/area_exam/papers/Maronna_1976.pdf.gz.

Maronna, Ricardo A., and Víctor J. Yohai. 2014. “Robust Estimation of Multivariate Location and Scatter.” In Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd. http://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat01520.pub2/abstract.

———. 1995. “The Behavior of the Stahel-Donoho Robust Multivariate Estimator.” Journal of the American Statistical Association 90 (429): 330–41. https://doi.org/10.1080/01621459.1995.10476517.

Maronna, Ricardo A., and Ruben H. Zamar. 2002. “Robust Estimates of Location and Dispersion for High-Dimensional Datasets.” Technometrics 44 (4): 307–17. http://amstat.tandfonline.com/doi/abs/10.1198/004017002188618509.

Massart, Desire L., Leonard Kaufman, Peter J. Rousseeuw, and Annick Leroy. 1986. “Least Median of Squares: A Robust Method for Outlier and Model Error Detection in Regression and Calibration.” Analytica Chimica Acta 187 (January): 171–79. https://doi.org/10.1016/S0003-2670(00)82910-4.

Mossel, Elchanan, Joe Neeman, and Allan Sly. 2016. “Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models.” The Annals of Applied Probability 26 (4): 2211–56. https://doi.org/10.1214/15-AAP1145.

———. 2013. “A Proof of the Block Model Threshold Conjecture,” November. http://arxiv.org/abs/1311.4115.

Oja, Hannu. 1983. “Descriptive Statistics for Multivariate Distributions.” Statistics & Probability Letters 1 (6): 327–32. https://doi.org/10.1016/0167-7152(83)90054-8.

Qian, Guoqi, and Hans R. Künsch. 1998. “On Model Selection via Stochastic Complexity in Robust Linear Regression.” Journal of Statistical Planning and Inference 75 (1): 91–116. https://doi.org/10.1016/S0378-3758(98)00138-4.

Ronchetti, E. 2000. “Robust Regression Methods and Model Selection.” In Data Segmentation and Model Selection for Computer Vision, edited by Alireza Bab-Hadiashar and David Suter, 31–40. Springer New York. https://doi.org/10.1007/978-0-387-21528-0_2.

Ronchetti, Elvezio. 1985. “Robust Model Selection in Regression.” Statistics & Probability Letters 3 (1): 21–23. https://doi.org/10.1016/0167-7152(85)90006-9.

———. 1997. “Robust Inference by Influence Functions.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part I, 57 (1): 59–72. https://doi.org/10.1016/S0378-3758(96)00036-5.

Ronchetti, Elvezio, and Fabio Trojani. 2001. “Robust Inference with GMM Estimators.” Journal of Econometrics 101 (1): 37–69. https://doi.org/10.1016/S0304-4076(00)00073-7.

Rousseeuw, Peter J. 1984. “Least Median of Squares Regression.” Journal of the American Statistical Association 79 (388): 871–80. https://doi.org/10.1080/01621459.1984.10477105.

Rousseeuw, Peter J., and Annick M. Leroy. 1987. Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.

Rousseeuw, P., and V. Yohai. 1984. “Robust Regression by Means of S-Estimators.” In Robust and Nonlinear Time Series Analysis, edited by Jürgen Franke, Wolfgang Härdle, and Douglas Martin, 256–72. Lecture Notes in Statistics 26. Springer US. https://doi.org/10.1007/978-1-4615-7821-5_15.

Royall, Richard M. 1986. “Model Robust Confidence Intervals Using Maximum Likelihood Estimators.” International Statistical Review / Revue Internationale de Statistique 54 (2): 221–26. https://doi.org/10.2307/1403146.

Stigler, Stephen M. 2010. “The Changing History of Robustness.” The American Statistician 64 (4): 277–81. https://doi.org/10.1198/tast.2010.10159.

Street, James O., Raymond J. Carroll, and David Ruppert. 1988. “A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares.” The American Statistician 42 (2): 152–54. https://doi.org/10.1080/00031305.1988.10475548.

Tharmaratnam, Kukatharmini, and Gerda Claeskens. 2013. “A Comparison of Robust Versions of the AIC Based on M-, S- and MM-Estimators.” Statistics 47 (1): 216–35. https://doi.org/10.1080/02331888.2011.568120.

Theil, Henri. 1992. “A Rank-Invariant Method of Linear and Polynomial Regression Analysis.” In Henri Theil’s Contributions to Economics and Econometrics, edited by Baldev Raj and Johan Koerts, 345–81. Advanced Studies in Theoretical and Applied Econometrics 23. Springer Netherlands. https://doi.org/10.1007/978-94-011-2546-8_20.

Tsou, Tsung-Shan. 2006. “Robust Poisson Regression.” Journal of Statistical Planning and Inference 136 (9): 3173–86. https://doi.org/10.1016/j.jspi.2004.12.008.

Wedderburn, R. W. M. 1974. “Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss—Newton Method.” Biometrika 61 (3): 439–47. https://doi.org/10.1093/biomet/61.3.439.

Xu, H., C. Caramanis, and S. Mannor. 2010. “Robust Regression and Lasso.” IEEE Transactions on Information Theory 56 (7): 3561–74. https://doi.org/10.1109/TIT.2010.2048503.

Yang, Tao, Colin M. Gallagher, and Christopher S. McMahan. 2019. “A Robust Regression Methodology via M-Estimation.” Communications in Statistics - Theory and Methods 48 (5): 1092–1107. https://doi.org/10.1080/03610926.2018.1423698.

Yang, Wenzhuo, and Huan Xu. 2013. “A Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3), 585–93. http://www.jmlr.org/proceedings/papers/v28/yang13e.pdf.