# Stability (in learning)

May 25, 2016 — October 5, 2016

estimator distribution
model selection
statistics
uncertainty

Your estimate is robust to a deleted data point? it is a stable estimate. This implies generalisability, apparently. The above statements can be made precise, I am told. Making them precise might give us new ideas for risk bounds, model selection, or connections to optimization.

Supposedly there is also connection to differential privacy, but since I don’t yet know anything about differential privacy I can’t take that statement any further except to note I would like to work it out one day. This is also presumably a connection to robust inference, since the setup sounds similar.

scrapbook:

Yu (2013):

Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to “reasonable” perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models.

Moritz Hardt, Stability as a foundation of machine learning:

Central to machine learning is our ability to relate how a learning algorithm fares on a sample to its performance on unseen instances. This is called generalization.

In this post, I will describe a purely algorithmic approach to generalization. The property that makes this possible is stability. An algorithm is stable, intuitively speaking, if its output doesn’t change much if we perturb the input sample in a single point. We will see that this property by itself is necessary and sufficient for generalization.

Xu, Caramanis, and Mannor (2012):

We consider two desired properties of learning algorithms: sparsity and algorithmic stability. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: A sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that ℓ1-regularized regression (Lasso) cannot be stable, while ℓ2-regularized regression is known to have strong stability properties and is therefore not sparse.

## 1 References

Agarwal, and Niyogi. 2009. In Journal of Machine Learning Research.
Bassily, Nissim, Smith, et al. 2015. arXiv:1511.02513 [Cs].
Bousquet, and Elisseeff. 2001. In Advances in Neural Information Processing Systems.
———. 2002. Journal of Machine Learning Research.
Chandramoorthy, Loukas, Gatmiry, et al. 2022.
Freeman, Yang, and Lynch. 2006. In 2006 45th IEEE Conference on Decision and Control.
Giryes, Sapiro, and Bronstein. 2014. arXiv:1412.5896 [Cs, Math, Stat].
Hardt, Ma, and Recht. 2018. The Journal of Machine Learning Research.
Hardt, Recht, and Singer. 2015. arXiv:1509.01240 [Cs, Math, Stat].
Kutin, and Niyogi. 2002. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. UAI’02.
Liu, Roeder, and Wasserman. 2010. In Advances in Neural Information Processing Systems 23.
Meinshausen, and Bühlmann. 2010. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Meng, Wang, Chen, et al. 2016. In arXiv:1609.08397 [Stat].
Xu, Caramanis, and Mannor. 2012. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Yu. 2013. Stability.” Bernoulli.
Zhang, Bengio, Hardt, et al. 2017. In Proceedings of ICLR.