# Robust statistics

Techniques to improve the failure modes of your estimates. Surprisingly rarely used despite being fairly straightforward.

This is more-or-less a frequentist project.

Bayesians seem to claim to achieve robustness largely by choosing heavy-tailed priors where they might have chosen light-tailed ones, e.g. Laplacian priors instead of Gaussian ones. Such priors may have arbitrary parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to rationalise away the guilt.

## TODO

• relation to penalized regression.

• connection with Lasso.

• Beran’s Hellinger-ball contamination model, which I also don’t yet understand.

• Breakdown point explanation

• glm connection.

## Corruption models

• (Adversarial) total variation $$\epsilon$$-corruption.

• Random (mixture) corruption

• other?

## M-estimation with robust loss

The one that I, at least, would think of when considering robust estimation.

In M-estimation, instead of hunting an maximum of the likelihood function as you do in maximum likelihood, or an minimum of the sum of squared residuals, as you do in least-squares estimation, you minimised a specifically chosen loss function for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (Hube64).

See M-estimation for the details

Aside: AFAICT, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares or negative log likelihood, but I have not seen this in the literature. Generally, some robustified approach is presumed.

For M-estimation as robust estimation, various complications ensue, such as the different between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.

Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.

And the cost is that you now have a loss function with some extra arbitrary parameters in which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians. You then have to justify why you chose that loss function and its particular parameterisation. There are various procedures to choose these parameters, based on scale estimation.

🏗 Don’t know

## Median-based estimators

Rousseeuw and Yohai’s idea. (RoYo84)

Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (Rous84)

More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? 🏗

Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. 🏗

Tukey median, and why no-one uses it what with it being NP-Hard.

## Others

RANSAC – some kind of randomised outlier detection estimator. 🏗

Barndorff-Nielsen, O. 1983. “On a Formula for the Distribution of the Maximum Likelihood Estimator.” Biometrika 70 (2): 343–65. https://doi.org/10.1093/biomet/70.2.343.

Beran, Rudolf. 1981. “Efficient Robust Estimates in Parametric Models.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 55 (1): 91–108. https://doi.org/10.1007/BF01013463.

———. 1982. “Robust Estimation in Models for Independent Non-Identically Distributed Data.” The Annals of Statistics 10 (2): 415–28. https://doi.org/10.1214/aos/1176345783.

Bickel, P. J. 1975. “One-Step Huber Estimates in the Linear Model.” Journal of the American Statistical Association 70 (350): 428–34. https://doi.org/10.1080/01621459.1975.10479884.

Bondell, Howard D., Arun Krishna, and Sujit K. Ghosh. 2010. “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics 66 (4): 1069–77. https://doi.org/10.1111/j.1541-0420.2010.01391.x.

Burman, P., and D. Nolan. 1995. “A General Akaike-Type Criterion for Model Selection in Robust Regression.” Biometrika 82 (4): 877–86. https://doi.org/10.1093/biomet/82.4.877.

Bühlmann, Peter. 2014. “Robust Statistics.” In Selected Works of Peter J. Bickel, edited by Jianqing Fan, Ya’acov Ritov, and C. F. Jeff Wu, 51–98. Selected Works in Probability and Statistics 13. Springer New York. http://link.springer.com/chapter/10.1007/978-1-4614-5544-8_2.

Cantoni, Eva, and Elvezio Ronchetti. 2001. “Robust Inference for Generalized Linear Models.” Journal of the American Statistical Association 96 (455): 1022–30. https://doi.org/10.1198/016214501753209004.

Charikar, Moses, Jacob Steinhardt, and Gregory Valiant. 2016. “Learning from Untrusted Data,” November. http://arxiv.org/abs/1611.02315.

Cox, D. R. 1983. “Some Remarks on Overdispersion.” Biometrika 70 (1): 269–74. https://doi.org/10.1093/biomet/70.1.269.

Czellar, Veronika, and Elvezio Ronchetti. 2010. “Accurate and Robust Tests for Indirect Inference.” Biometrika 97 (3): 621–30. https://doi.org/10.1093/biomet/asq040.

Diakonikolas, Ilias, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2016. “Robust Estimators in High Dimensions Without the Computational Intractability,” April. http://arxiv.org/abs/1604.06443.

Diakonikolas, Ilias, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2017. “Being Robust (in High Dimensions) Can Be Practical,” March. http://arxiv.org/abs/1703.00893.

Donoho, David L., and Peter J. Huber. 1983. “The Notion of Breakdown Point.” A Festschrift for Erich L. Lehmann 157184. https://books.google.ch/books?hl=en&lr=&id=H8QdaAPW3c8C&oi=fnd&pg=PA157&dq=donoho+huber+1983+breakdown+point&ots=I38CG8Bt_-&sig=BKoPX6T8T3r_qwPzmCjWY96PsKI.

Donoho, David L., and Richard C. Liu. 1988. “The "Automatic" Robustness of Minimum Distance Functionals.” The Annals of Statistics 16 (2): 552–86. https://doi.org/10.1214/aos/1176350820.

Donoho, David L., and Andrea Montanari. 2013. “High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing,” October. http://arxiv.org/abs/1310.7320.

Duchi, John, Peter Glynn, and Hongseok Namkoong. 2016. “Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach,” October. http://arxiv.org/abs/1610.03425.

Genton, Marc G, and Elvezio Ronchetti. 2003. “Robust Indirect Inference.” Journal of the American Statistical Association 98 (461): 67–76. https://doi.org/10.1198/016214503388619102.

Ghosh, Abhik, and Ayanendranath Basu. 2016. “General Model Adequacy Tests and Robust Statistical Inference Based on A New Family of Divergences,” November. http://arxiv.org/abs/1611.05224.

Golubev, Grigori K., and Michael Nussbaum. 1990. “A Risk Bound in Sobolev Class Regression.” The Annals of Statistics 18 (2): 758–78. https://doi.org/10.1214/aos/1176347624.

Hampel, Frank R. 1974. “The Influence Curve and Its Role in Robust Estimation.” Journal of the American Statistical Association 69 (346): 383–93. https://doi.org/10.1080/01621459.1974.10482962.

Hampel, Frank R., Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel. 2011. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons. http://books.google.com?id=XK3uhrVefXQC.

Huber, Peter J. 1964. “Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics 35 (1): 73–101. https://doi.org/10.1214/aoms/1177703732.

———. 2009. Robust Statistics. 2nd ed. Wiley Series in Probability and Statistics. Hoboken, N.J: Wiley.

Janková, Jana, and Sara van de Geer. 2016. “Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity,” October. http://arxiv.org/abs/1610.01353.

Konishi, Sadanori, and G. Kitagawa. 2008. Information Criteria and Statistical Modeling. Springer Series in Statistics. New York: Springer.

Konishi, Sadanori, and Genshiro Kitagawa. 1996. “Generalised Information Criteria in Model Selection.” Biometrika 83 (4): 875–90. https://doi.org/10.1093/biomet/83.4.875.

———. 2003. “Asymptotic Theory for Information Criteria in Model Selection—Functional Approach.” Journal of Statistical Planning and Inference, C.R. Rao 80th Birthday Felicitation vol., Part IV, 114 (1–2): 45–61. https://doi.org/10.1016/S0378-3758(02)00462-7.

Krzakala, Florent, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, and Pan Zhang. 2013. “Spectral Redemption in Clustering Sparse Networks.” Proceedings of the National Academy of Sciences 110 (52): 20935–40. https://doi.org/10.1073/pnas.1312486110.

Li, Jerry. 2017. “Robust Sparse Estimation Tasks in High Dimensions,” February. http://arxiv.org/abs/1702.05860.

LU, W., Y. GOLDBERG, and J. P. FINE. 2012. “On the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika 99 (3): 717–31. https://doi.org/10.1093/biomet/ass027.

Machado, José A.F. 1993. “Robust Model Selection and M-Estimation.” Econometric Theory 9 (03): 478–93. https://doi.org/10.1017/S0266466600007775.

Manton, J. H., V. Krishnamurthy, and H. V. Poor. 1998. “James-Stein State Filtering Algorithms.” IEEE Transactions on Signal Processing 46 (9): 2431–47. https://doi.org/10.1109/78.709532.

Markatou, M., and E. Ronchetti. 1997. “3 Robust Inference: The Approach Based on Influence Functions.” In Handbook of Statistics, edited by BT - Handbook of Statistics, 15:49–75. Robust Inference. Elsevier. https://doi.org/10.1016/S0169-7161(97)15005-2.

Maronna, Ricardo A., Douglas Martin, and Víctor J. Yohai. 2006. Robust Statistics: Theory and Methods. Reprinted with corr. Wiley Series in Probability and Statistics. Chichester: Wiley.

Maronna, Ricardo Antonio. 1976. “Robust M-Estimators of Multivariate Location and Scatter.” The Annals of Statistics 4 (1): 51–67. http://ssg.mit.edu/group/ajkim/area_exam/papers/Maronna_1976.pdf.gz.

Maronna, Ricardo A., and Víctor J. Yohai. 1995. “The Behavior of the Stahel-Donoho Robust Multivariate Estimator.” Journal of the American Statistical Association 90 (429): 330–41. https://doi.org/10.1080/01621459.1995.10476517.

———. 2014. “Robust Estimation of Multivariate Location and Scatter.” In Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd. http://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat01520.pub2/abstract.

Maronna, Ricardo A., and Ruben H. Zamar. 2002. “Robust Estimates of Location and Dispersion for High-Dimensional Datasets.” Technometrics 44 (4): 307–17. http://amstat.tandfonline.com/doi/abs/10.1198/004017002188618509.

Massart, Desire L., Leonard Kaufman, Peter J. Rousseeuw, and Annick Leroy. 1986. “Least Median of Squares: A Robust Method for Outlier and Model Error Detection in Regression and Calibration.” Analytica Chimica Acta 187 (January): 171–79. https://doi.org/10.1016/S0003-2670(00)82910-4.

Mossel, Elchanan, Joe Neeman, and Allan Sly. 2013. “A Proof of the Block Model Threshold Conjecture,” November. http://arxiv.org/abs/1311.4115.

———. 2016. “Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models.” The Annals of Applied Probability 26 (4): 2211–56. https://doi.org/10.1214/15-AAP1145.

Oja, Hannu. 1983. “Descriptive Statistics for Multivariate Distributions.” Statistics & Probability Letters 1 (6): 327–32. https://doi.org/10.1016/0167-7152(83)90054-8.

Qian, Guoqi, and Hans R. Künsch. 1998. “On Model Selection via Stochastic Complexity in Robust Linear Regression.” Journal of Statistical Planning and Inference 75 (1): 91–116. https://doi.org/10.1016/S0378-3758(98)00138-4.

Ronchetti, E. 2000. “Robust Regression Methods and Model Selection.” In Data Segmentation and Model Selection for Computer Vision, edited by Alireza Bab-Hadiashar and David Suter, 31–40. Springer New York. https://doi.org/10.1007/978-0-387-21528-0_2.

Ronchetti, Elvezio. 1985. “Robust Model Selection in Regression.” Statistics & Probability Letters 3 (1): 21–23. https://doi.org/10.1016/0167-7152(85)90006-9.

———. 1997. “Robust Inference by Influence Functions.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part I, 57 (1): 59–72. https://doi.org/10.1016/S0378-3758(96)00036-5.

Ronchetti, Elvezio, and Fabio Trojani. 2001. “Robust Inference with GMM Estimators.” Journal of Econometrics 101 (1): 37–69. https://doi.org/10.1016/S0304-4076(00)00073-7.

Rousseeuw, Peter J. 1984. “Least Median of Squares Regression.” Journal of the American Statistical Association 79 (388): 871–80. https://doi.org/10.1080/01621459.1984.10477105.

Rousseeuw, Peter J., and Annick M. Leroy. 1987. Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.

Rousseeuw, P., and V. Yohai. 1984. “Robust Regression by Means of S-Estimators.” In Robust and Nonlinear Time Series Analysis, edited by Jürgen Franke, Wolfgang Härdle, and Douglas Martin, 256–72. Lecture Notes in Statistics 26. Springer US. https://doi.org/10.1007/978-1-4615-7821-5_15.

Royall, Richard M. 1986. “Model Robust Confidence Intervals Using Maximum Likelihood Estimators.” International Statistical Review / Revue Internationale de Statistique 54 (2): 221–26. https://doi.org/10.2307/1403146.

Stigler, Stephen M. 2010. “The Changing History of Robustness.” The American Statistician 64 (4): 277–81. https://doi.org/10.1198/tast.2010.10159.

Tharmaratnam, Kukatharmini, and Gerda Claeskens. 2013. “A Comparison of Robust Versions of the AIC Based on M-, S- and MM-Estimators.” Statistics 47 (1): 216–35. https://doi.org/10.1080/02331888.2011.568120.

Theil, Henri. 1992. “A Rank-Invariant Method of Linear and Polynomial Regression Analysis.” In Henri Theil’s Contributions to Economics and Econometrics, edited by Baldev Raj and Johan Koerts, 345–81. Advanced Studies in Theoretical and Applied Econometrics 23. Springer Netherlands. https://doi.org/10.1007/978-94-011-2546-8_20.

Tsou, Tsung-Shan. 2006. “Robust Poisson Regression.” Journal of Statistical Planning and Inference 136 (9): 3173–86. https://doi.org/10.1016/j.jspi.2004.12.008.

Wedderburn, R. W. M. 1974. “Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss—Newton Method.” Biometrika 61 (3): 439–47. https://doi.org/10.1093/biomet/61.3.439.

Xu, H., C. Caramanis, and S. Mannor. 2010. “Robust Regression and Lasso.” IEEE Transactions on Information Theory 56 (7): 3561–74. https://doi.org/10.1109/TIT.2010.2048503.

Yang, Wenzhuo, and Huan Xu. 2013. “A Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3), 585–93. http://www.jmlr.org/proceedings/papers/v28/yang13e.pdf.