Tricks of particular use in modeling survey data. Hierarchical models to adjust for issues such as non random sampling and the varied great difficulties of eliciting human preferences by asking them. A grab bag of the weird data types, problems and sampling bias problems.
What is that Lizardman constant? The problem that 4% of people will claim their president is an alien Lizard monster on a survey; What does that say about the data in general? See Survey Chicken for some hopelessness about the entire project of soliciting real information from questionnaires.
Reweighting your data to correct for various types of remediable sampling bias. See post stratification.
Cluster randomized trials
Melanie Bell, Cluster Randomized Trials
Cluster randomized trials (CRTs) are studies where groups of people, rather than individuals, are randomly allocated to intervention or control. While these type of designs can be appropriate and useful for many research settings, care must be taken to correctly design and analyze them. This talk will give an overview of cluster trials, and various methodological research projects on cluster trials that I’ve been undertaken: designing CRTs, the use of GEE with small number of clusters, handling missing data in CRTs, and analysis using mixed models. I will demonstrate methods with an example from a recently completed trial on reducing cardiovascular risk among Mexican diabetics.
Ordinal models are how we usually get data from people. Think star ratings, or Likert scales.
Confounding and observational studies
There is some interesting crossover with clinical trial theory.
It is a commonly held belief that clinical trials, to provide treatment effects that are generalizable to a population, must use a sample that reflects that population’s characteristics. The confusion stems from the fact that if one were interested in estimating an average outcome for patients given treatment A, one would need a random sample from the target population. But clinical trials are not designed to estimate absolutes; they are designed to estimate differences as discussed further here. These differences, when measured on a scale for which treatment differences are allowed mathematically to be constant (e.g., difference in means, odds ratios, hazard ratios), show remarkable constancy as judged by a large number of published forest plots. What would make a treatment estimate (relative efficacy) not be transportable to another population? A requirement for non-generalizability is the existence of interactions with treatment such that the interacting factors have a distribution in the sample that is much different from the distribution in the population.
A related problem is the issue of overlap in observational studies. Researchers are taught that non-overlap makes observational treatment comparisons impossible. This is only true when the characteristic whose distributions don’t overlap between treatment groups interacts with treatment. The purpose of this article is to explore interactions in these contexts.
As a side note, if there is an interaction between treatment and a covariate, standard propensity score analysis will completely miss it.
SDA is a suite of software developed at Berkeley for the web-based analysis of survey data. The Berkeley SDA archive lets you run various kinds of analyses on a number of public datasets, such as the General Social Survey. It also provides consistently-formatted HTML versions of the codebooks for the surveys it hosts. This is very convenient! For the gssr package, I wanted to include material from the codebooks as tibbles or data frames that would be accessible inside an R session. Processing the official codebook from its native PDF state into a data frame is, though technically possible, a rather off-putting prospect. But SDA has done most of the work already by making the pages available in HTML. I scraped the codebook pages from them instead. This post contains the code I used to do that.