Another thing I won’t have time to blog or fully understand, but will collect a few explanatory blog posts about for emergency cribbing.
Let’s say you wanted to count how many of your online friends were dogs, while respecting the maxim that, on the Internet, nobody should know you’re a dog. To do this, you could ask each friend to answer the question “Are you a dog?” in the following way. Each friend should flip a coin in secret, and answer the question truthfully if the coin came up heads; but, if the coin came up tails, that friend should always say “Yes” regardless. Then you could get a good estimate of the true count from the greater-than-half fraction of your friends that answered “Yes”. However, you still wouldn’t know which of your friends was a dog: each answer “Yes” would most likely be due to that friend’s coin flip coming up tails.
NB this would need to be a weighted coin, or you don’t learn anything.
This has recently become particularly publicly interesting because the US census has fingered mathematical differential privacy methods for preserving literal citizen privacy. This has spawned some good layperson’s introductions:
Alexandra Wood et al, Differential Privacy: A Primer for a Non-Technical Audience, and Mark Hansen has written an illustrated explanation.
There is a fun paper (Dimitrakakis et al. 2013) arguing that Bayesian posterior sampling has certain differential privacy guarantees.
Practical: see Google’s differential privacy library for miscellaneous reporting. PPRL, Privacy-Preserving-Record-Linkage is an R package for probabilistically connecting data sets in an (optionally) privacy-compatible way. There is a review of several libraries by Nils Amiet.
Bassily, Raef, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. 2015. “Algorithmic Stability for Adaptive Data Analysis,” November. http://arxiv.org/abs/1511.02513.
Charles-Edouard, Bréhier, Gazeau Maxime, Goudenège Ludovic, Lelièvre Tony, and Rousset Mathias. 2015. “Unbiasedness of Some Generalized Adaptive Multilevel Splitting Algorithms,” May. http://arxiv.org/abs/1505.02674.
Dimitrakakis, Christos, Blaine Nelson, and Zuhe Zhang, Aikaterini Mitrokotsa, and Benjamin Rubinstein. 2013. “Bayesian Differential Privacy Through Posterior Sampling,” June. http://arxiv.org/abs/1306.1066.
Dwork, Cynthia. 2006. “Differential Privacy.” In. Vol. 4052. https://www.microsoft.com/en-us/research/publication/differential-privacy/.
Dwork, Cynthia, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. 2015a. “The Reusable Holdout: Preserving Validity in Adaptive Data Analysis.” Science 349 (6248): 636–38. https://doi.org/10.1126/science.aaa9375.
Dwork, Cynthia, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. 2015b. “Preserving Statistical Validity in Adaptive Data Analysis.” In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing - STOC ’15, 117–26. Portland, Oregon, USA: ACM Press. https://doi.org/10.1145/2746539.2746580.
Fanti, Giulia, Vasyl Pihur, and Úlfar Erlingsson. 2015. “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” March. http://arxiv.org/abs/1503.01214.
Jung, Christopher, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Moshe Shenfeld. 2019. “A New Analysis of Differential Privacy’s Generalization Guarantees,” September. http://arxiv.org/abs/1909.03577.
Wood, Alexandra, Micah Altman, Aaron Bembenek, Mark Bun, James Honaker, Kobbi Nissim, David R O’Brien, and Salil Vadhan. 2019. “Differential Privacy: A Primer for a Non-Technical Audience” 21: 68. http://www.jetlaw.org/journal-archives/volume-21/volume-21-issue-1/differential-privacy-a-primer-for-a-non-technical-audience/.
Zhang, Zuhe, Benjamin I. P. Rubinstein, and Christos Dimitrakakis. 2016. “On the Differential Privacy of Bayesian Inference.” In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2365–71. AAAI’16. Phoenix, Arizona: AAAI Press. http://arxiv.org/abs/1512.06992.