What is that Lizardman constant? The problem that 4% of people will claim their president is an alien Lizard monster on a survey; What does that say about the data in general?
Dan Simpson does Mr P.
Attention conservation notice: 2700 words on a new paper on causal inference in social networks, and why it is hard. Instills an attitude of nihilistic skepticism and despair over a technical enterprise you never knew existed, much less cared about …
Time series/longitudinal studies and issues
SDA is a suite of software developed at Berkeley for the web-based analysis of survey data. The Berkeley SDA archive lets you run various kinds of analyses on a number of public datasets, such as the General Social Survey. It also provides consistently-formatted HTML versions of the codebooks for the surveys it hosts. This is very convenient! For the gssr package, I wanted to include material from the codebooks as tibbles or data frames that would be accessible inside an R session. Processing the official codebook from its native PDF state into a data frame is, though technically possible, a rather off-putting prospect. But SDA has done most of the work already by making the pages available in HTML. I scraped the codebook pages from them instead. This post contains the code I used to do that.
Achlioptas, Dimitris, Aaron Clauset, David Kempe, and Cristopher Moore. 2005. “On the Bias of Traceroute Sampling: Or, Power-Law Degree Distributions in Regular Graphs.” In Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, 694–703. STOC ’05. New York, NY, USA: ACM. https://doi.org/10.1145/1060590.1060693.
Bareinboim, Elias, and Judea Pearl. 2016. “Causal Inference and the Data-Fusion Problem.” Proceedings of the National Academy of Sciences 113 (27): 7345–52. https://doi.org/10.1073/pnas.1510507113.
Bareinboim, Elias, Jin Tian, and Judea Pearl. 2014. “Recovering from Selection Bias in Causal and Statistical Inference.” In AAAI, 2410–6. http://ftp.cs.ucla.edu/pub/stat_ser/r425.pdf.
Bond, Robert M., Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer, Cameron Marlow, Jaime E. Settle, and James H. Fowler. 2012. “A 61-Million-Person Experiment in Social Influence and Political Mobilization.” Nature 489 (7415): 295–98. https://doi.org/10.1038/nature11421.
Broockman, David E., Joshua Kalla, and Jasjeet S. Sekhon. 2016. “The Design of Field Experiments with Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs.” SSRN Scholarly Paper ID 2742869. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=2742869.
Gao, Yuxiang, Lauren Kennedy, Daniel Simpson, and Andrew Gelman. 2019. “Improving Multilevel Regression and Poststratification with Structured Priors,” August. http://arxiv.org/abs/1908.06716.
Gelman, Andrew. 2007. “Struggles with Survey Weighting and Regression Modeling.” Statistical Science 22 (2): 153–64. https://doi.org/10.1214/088342306000000691.
Gelman, Andrew, and John B. Carlin. 2000. “Poststratification and Weighting Adjustments.” In In. Wiley. http://www.stat.columbia.edu/~gelman/research/published/handbook5.pdf.
Ghitza, Yair, and Andrew Gelman. 2013. “Deep Interactions with MRP: Election Turnout and Voting Patterns Among Small Electoral Subgroups.” American Journal of Political Science 57 (3): 762–76. https://doi.org/10.1111/ajps.12004.
Hart, Einav, Eric VanEpps, and Maurice E. Schweitzer. 2019. “I Didn’t Want to Offend You: The Cost of Avoiding Sensitive Questions.” SSRN Scholarly Paper ID 3437468. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=3437468.
Kennedy, Edward H., Jacqueline A. Mauro, Michael J. Daniels, Natalie Burns, and Dylan S. Small. 2019. “Handling Missing Data in Instrumental Variable Methods for Causal Inference.” Annual Review of Statistics and Its Application 6 (1): 125–48. https://doi.org/10.1146/annurev-statistics-031017-100353.
Kohler, Ulrich, Frauke Kreuter, and Elizabeth A. Stuart. 2019. “Nonprobability Sampling and Causal Analysis.” Annual Review of Statistics and Its Application 6 (1): 149–72. https://doi.org/10.1146/annurev-statistics-030718-104951.
Kong, Yuqing. 2019. “Dominantly Truthful Multi-Task Peer Prediction with a Constant Number of Tasks,” November. http://arxiv.org/abs/1911.00272.
Krivitsky, Pavel N., and Martina Morris. 2017. “Inference for Social Network Models from Egocentrically Sampled Data, with Application to Understanding Persistent Racial Disparities in Hiv Prevalence in the Us.” The Annals of Applied Statistics 11 (1): 427–55. https://doi.org/10.1214/16-AOAS1010.
Lerman, Kristina. 2017. “Computational Social Scientist Beware: Simpson’s Paradox in Behavioral Data,” October. http://arxiv.org/abs/1710.08615.
Little, R. J. A. 1993. “Post-Stratification: A Modeler’s Perspective.” Journal of the American Statistical Association 88 (423): 1001–12. https://doi.org/10.1080/01621459.1993.10476368.
Little, Roderick JA. 1991. “Inference with Survey Weights.” Journal of Official Statistics 7 (4): 405.
Prelec, Dražen, H. Sebastian Seung, and John McCoy. 2017. “A Solution to the Single-Question Crowd Wisdom Problem.” Nature 541 (7638): 532–35. https://doi.org/10.1038/nature21054.
Rubin, Donald B, and Richard P Waterman. 2006. “Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology.” Statistical Science 21 (2): 206–22. https://doi.org/10.1214/088342306000000259.
Shalizi, Cosma Rohilla, and Edward McFowland III. 2016. “Controlling for Latent Homophily in Social Networks Through Inferring Latent Locations,” July. http://arxiv.org/abs/1607.06565.
Shalizi, Cosma Rohilla, and Andrew C. Thomas. 2011. “Homophily and Contagion Are Generically Confounded in Observational Social Network Studies.” Sociological Methods & Research 40 (2): 211–39. https://doi.org/10.1177/0049124111404820.
Yadav, Pranjul, Lisiane Prunelli, Alexander Hoff, Michael Steinbach, Bonnie Westra, Vipin Kumar, and Gyorgy Simon. 2016. “Causal Inference in Observational Data,” November. http://arxiv.org/abs/1611.04660.