The meeting point of differential privacy, accountability, interpretability, the tank detection story, clever horses in machine learning. Closely related: are the models what you would call fair?
Much work here; I understand little of it at the moment, but I keep needing to refer to papers here.
- Frequently I need the link to LIME, a neat model that uses penalised regression to do local model explanations. (Ribeiro, Singh, and Guestrin 2016) See their blog post.
- A cousin of LIME but for tree classifiers is SHAP
- The deep dream “activation maximisation” images could sort of be classified as a type of model explanation, e.g. Multifaceted neuron visualization (Nguyen, Yosinski, and Clune 2016)
- Belatedly I notice that the Data Skpetic podcast did a whole season on interpretability
Aggarwal, Charu C., and Philip S. Yu. 2008. “A General Survey of Privacy-Preserving Data Mining Models and Algorithms.” In Privacy-Preserving Data Mining, edited by Charu C. Aggarwal and Philip S. Yu, 11–52. Advances in Database Systems 34. Springer US. https://doi.org/10.1007/978-0-387-70992-5_2.
Alain, Guillaume, and Yoshua Bengio. 2016. “Understanding Intermediate Layers Using Linear Classifier Probes.” October 5, 2016. http://arxiv.org/abs/1610.01644.
Barocas, Solon, and Andrew D. Selbst. 2016. “Big Data’s Disparate Impact.” SSRN Scholarly Paper ID 2477899. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=2477899.
Bowyer, Kevin W., Michael King, and Walter Scheirer. 2020. “The Criminality from Face Illusion.” June 6, 2020. http://arxiv.org/abs/2006.03895.
Burrell, Jenna. 2016. “How the Machine ’Thinks’: Understanding Opacity in Machine Learning Algorithms.” Big Data & Society 3 (1): 2053951715622512. https://doi.org/10.1177/2053951715622512.
Chipman, Hugh A., and Hong Gu. 2005. “Interpretable Dimension Reduction.” Journal of Applied Statistics 32 (9): 969–87. https://doi.org/10.1080/02664760500168648.
Dwork, Cynthia, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. “Fairness Through Awareness.” In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 214–26. ITCS ’12. New York, NY, USA: ACM. https://doi.org/10.1145/2090236.2090255.
Feldman, Michael, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. “Certifying and Removing Disparate Impact.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 259–68. KDD ’15. New York, NY, USA: ACM. https://doi.org/10.1145/2783258.2783311.
Hardt, Moritz, Eric Price, and Nati Srebro. 2016. “Equality of Opportunity in Supervised Learning.” In Advances in Neural Information Processing Systems, 3315–23. http://papers.nips.cc/paper/6373-equality-of-opportunity-in-supervised-learning.
Hidalgo, César A., Diana Orghian, Jordi Albo Canals, Filipa de Almeida, and Natalia Martín Cantero. 2021. How Humans Judge Machines. Cambridge, Massachusetts: The MIT Press.
Kilbertus, Niki, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. 2017. “Avoiding Discrimination Through Causal Reasoning.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 656–66. Curran Associates, Inc. http://papers.nips.cc/paper/6668-avoiding-discrimination-through-causal-reasoning.pdf.
Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. 2016. “Inherent Trade-Offs in the Fair Determination of Risk Scores,” September. https://arxiv.org/abs/1609.05807v1.
Lash, Michael T., Qihang Lin, W. Nick Street, Jennifer G. Robinson, and Jeffrey Ohlmann. 2016. “Generalized Inverse Classification.” October 5, 2016. http://arxiv.org/abs/1610.01675.
Lipton, Zachary C. 2016. “The Mythos of Model Interpretability.” In. http://arxiv.org/abs/1606.03490.
Miconi, Thomas. 2017. “The Impossibility of "Fairness": A Generalized Impossibility Result for Decisions,” July. https://arxiv.org/abs/1707.01195.
Moosavi-Dezfooli, Seyed-Mohsen, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2016. “Universal Adversarial Perturbations.” In. http://arxiv.org/abs/1610.08401.
Nguyen, Anh, Jason Yosinski, and Jeff Clune. 2016. “Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned by Each Neuron in Deep Neural Networks.” 2016. http://arxiv.org/abs/1602.03616.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should I Trust You?": Explaining the Predictions of Any Classifier.” In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. KDD ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939778.
Sweeney, Latanya. 2013. “Discrimination in Online Ad Delivery.” Queue 11 (3): 10:10–10:29. https://doi.org/10.1145/2460276.2460278.
Wisdom, Scott, Thomas Powers, James Pitton, and Les Atlas. 2016. “Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1611.07252.
Wu, Xiaolin, and Xi Zhang. 2016. “Automated Inference on Criminality Using Face Images.” November 13, 2016. http://arxiv.org/abs/1611.04135.
Zemel, Rich, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. “Learning Fair Representations.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 325–33. http://machinelearning.wustl.edu/mlpapers/papers/icml2013_zemel13.