Your estimate is robust to a deleted data point? it is a stable estimate.
This implies generalisability, apparently.
The above statements can be made precise, I am told.
Making them precise might give us new ideas for
model selection, or connections to
Supposedly there is also connection to
but since I don’t yet know anything about differential privacy
I can’t take that statement any further
except to note I would like to work it out one day.
This is also presumably a connection to
robust inference, since the setup sounds similar.
Reproducibility is imperative for any scientific discovery.
More often than not,
modern scientific findings rely on statistical analysis of
At a minimum, reproducibility manifests itself in stability of statistical
results relative to “reasonable” perturbations to data and to the model used.
Jacknife, bootstrap, and cross-validation are based on perturbations to data,
while robust statistics methods deal with perturbations to models.
Moritz Hardt, Stability as a foundation of machine learning:
Central to machine learning is our ability to relate
how a learning algorithm fares on a sample
to its performance on unseen instances.
This is called generalization.
In this post,
I will describe a purely algorithmic approach to generalization.
The property that makes this possible is stability.
An algorithm is stable, intuitively speaking,
if its output doesn’t change much
if we perturb the input sample in a single point.
We will see that this property by itself
is necessary and sufficient for generalization.
Xu, Caramanis, and Mannor (2012):
We consider two desired properties of learning algorithms: sparsity and algorithmic stability. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: A sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that ℓ1-regularized regression (Lasso) cannot be stable, while ℓ2-regularized regression is known to have strong stability properties and is therefore not sparse.
Agarwal, Shivani, and Partha Niyogi. 2009. “Generalization Bounds for Ranking Algorithms via Algorithmic Stability.”
In Journal of Machine Learning Research
, 10:441–74. http://machinelearning.wustl.edu/mlpapers/paper_files/jmlr10_agarwal09a.pdf
Bassily, Raef, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. 2015. “Algorithmic Stability for Adaptive Data Analysis.”
November 8, 2015. http://arxiv.org/abs/1511.02513
Bousquet, Olivier, and André Elisseeff. 2001. “Algorithmic Stability and Generalization Performance.”
In Advances in Neural Information Processing Systems
, 13:196–202. MIT Press
———. 2002. “Stability and Generalization.” Journal of Machine Learning Research
2: 499–526. http://www.jmlr.org/papers/v2/bousquet02a.html
Freeman, R. A., Peng Yang, and K. M. Lynch. 2006. “Stability and Convergence Properties of Dynamic Average Consensus Estimators.”
In 2006 45th IEEE Conference on Decision and Control
, 338–43. San Diego, CA, USA
Giryes, Raja, Guillermo Sapiro, and Alex M. Bronstein. 2014. “On the Stability of Deep Networks.”
December 18, 2014. http://arxiv.org/abs/1412.5896
Hardt, Moritz, Tengyu Ma, and Benjamin Recht. 2018. “Gradient Descent Learns Linear Dynamical Systems.” The Journal of Machine Learning Research
19 (1): 1025–68. http://arxiv.org/abs/1609.05191
Hardt, Moritz, Benjamin Recht, and Yoram Singer. 2015. “Train Faster, Generalize Better: Stability of Stochastic Gradient Descent.”
September 3, 2015. http://arxiv.org/abs/1509.01240
Kutin, Samuel, and Partha Niyogi. 2002. “Almost-Everywhere Algorithmic Stability and Generalization Error.”
In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence
, 275–82. UAI
’02. San Francisco, CA, USA
: Morgan Kaufmann Publishers Inc. http://arxiv.org/abs/1301.0579
Liu, Han, Kathryn Roeder, and Larry Wasserman. 2010. “Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models.”
In Advances in Neural Information Processing Systems 23
, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 1432–40. Curran Associates, Inc. http://papers.nips.cc/paper/3966-stability-approach-to-regularization-selection-stars-for-high-dimensional-graphical-models.pdf
Meinshausen, Nicolai, and Peter Bühlmann. 2010. “Stability Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology)
72 (4): 417–73. https://doi.org/10.1111/j.1467-9868.2010.00740.x
Meng, Qi, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, and Tie-Yan Liu. 2016. “Generalization Error Bounds for Optimization Algorithms via Stability.”
In, 10:441–74. http://arxiv.org/abs/1609.08397
Xu, H., C. Caramanis, and S. Mannor. 2012. “Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence
34 (1): 187–93. https://doi.org/10.1109/TPAMI.2011.177
Yu, Bin. 2013. “Stability.” Bernoulli
19 (4): 1484–1500. https://doi.org/10.3150/13-BEJSP14
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. “Understanding Deep Learning Requires Rethinking Generalization.”
In Proceedings of ICLR