Bias and base rates
September 15, 2022 — November 4, 2022
To do: add a contrast between the equity implications of test bias and the equity implications of base rate.
To do: graphs.
To-do: start with examples of the four cases true positives true negatives etc
Pet peeve: when we consider the problem of bias, in the sense of unwarranted prejudice, in our assessment of people for, for example, jobs we often present information about particular assessments as if that were sufficient to fix the bias. It is not sufficient. For example, we might examine how much worse the ranking given to women is than men of equivalent qualification.
To be clear, I believe this bias phenomenon is both real and important. I won’t go into the literature here. For a beautiful simple example see citation, where changing the gender of the name on a CV changes people’s rating of a candidate by about one point on a seven-point scale. This kind of result is pretty common. I think it is too important a problem to do a bad job fixing it.
However, information about the bias in the assessment is insufficient to tell us the actual magnitude of the inequity, and the appropriate response to deal with it. The reason for this is what we call the base rate problem. People are reliably pretty bad at remembering to take base rates into account. This has been made famous recently by the Covid pandemic when we all suddenly learned that if we want to know how likely we are to have Covid based on a RAT test, we have to know both the chance of a false negative, the chance of a false positive, and also the actual rate of Covid in the population around us. If we only know about the test itself, we don’t actually know our own chances of having Covid. The fact that we don’t recognise we need to also know the community Covid rate is what we call the base rate fallacy.
The base rate fallacy also arises in questions of equity and bias. Failing to take into account the base rate or, perhaps, estimating very different base rates, is one source of disagreement about how to redress inequity.
Let’s walk through this. To keep it simple I’ll assume that we are just going to think about a yes-no test “is someone qualified for this job”? If we want to be fancy we can consider it as a ranking problem. In practice, when we are hiring people to fill a particular position, we usually don’t use some kind of yes-no test. What we usually do is rank some candidates, but I think we can address the basic trouble with a simple case.
Okay, the simplified model. Let’s imagine that you are a hiring manager for a given job you have to interview various candidates. Let’s say that there is some kind of screening questionnaire. Now let’s suppose that I am in some kind of in-group you can imagine white males. Let us further suppose that if I assessed people for this job that I’m currently hiring for on the basis of the questionnaire, the questionnaire identifies good candidates with a true positive rate of, say, 80%; that is if the candidate is truly qualified, there is an 80% chance this questionnaire will identify them as truly qualified. Let us suppose that the true negative rate of this questionnaire is 80% as well. That is, if the candidate is not in fact qualified, there is an 80% chance that the questionnaire will tell us that they are not qualified.
We aim to hire candidates that we believe have the best chance of being qualified.
Now, because we are all used to thinking about Covid tests, we know that a positive result on our hiring questionnaire does not tell us that there is an 80% chance the candidate is in fact qualified. To know that, we would need to know what proportion of people applying for jobs like ours are in fact qualified. So if only one in 10 people who applied for this job were in fact qualified for the job, then most of the positives that we will see in our questionnaire will be false positives, even though the positive rate for any given qualified person is 80%.
This much should be pretty standard by now, but there is a twist.
Suppose that I tell you that the test rates that I said above for this test were in fact biased towards an in-group. If this test is assessed by fallible human beings inside my in-group, maybe we are more lenient towards other people in our in-group. Suppose that this 80% positive 80% negative rate is the kind of rate that I will achieve on this task if I am marking other people who, like myself, are middle-class white males.
Suppose that the true positive rate for people from some out-group is something like 50% and the true negative rate is something like 90%. That is to say, because of my biases, I am going to rate people from this out-group as ‘qualified’ with some lower probability than if they were from my group.
Now we might agree that this is an inequitable situation and we should do something to address it; that we should put a hand on the scales, say, and readjust that biased test result to be something closer to fair. How hard should we lean on the scales?
The answer depends on the base rate. The answer depends on how many people are truly qualified from the in-group and how many people are qualified from the out-group.
I think failing to see the importance of this now we can imagine how people could in fact disagree about this.
Suppose as an extreme example, that literally no job hunter from the out-group is qualified to do this job. Maybe we exist in an apartheid society where people from the out-group were historically denied education so no-one is qualified (which sucks, but won’t change whether the person we are looking at is qualified) Or, maybe because the larger society has recognised and redressed a historical injustice against the out-group, people from the out-group have been proactively hired and literally everyone qualified from that group already has a job. In the case that the recruiting pool is size zero, even though the test is biased, all the positives in that out-group will in fact be false positives. All the negatives will be true negatives.
In this situation, the bias of the test is not contributing towards any injustice. That prejudice, though real, is essentially irrelevant. Even though the test is contaminated by our bias and unfairly discriminates against this out-group, nonetheless because of an unfortunate lack of appropriate skills in the out-group it will do us no good to recruit from them anyway. If we want to address the fact that people from the out-group can’t get this kind of job, the appropriate intervention is to offer skills and training or apprenticeships, something like that, so that people from the out-group are in fact capable of doing the job. I think this is the kind of model that people who talk about “pipeline problems” have in their mind.
Now, suppose the contrary extreme. Suppose that people from this out-group are in fact vastly more qualified than people from the in-group for this particular job. For example, maybe people from the out-group come from some kind of skilled migration program: if 100% of the people in this out-group are actually qualified, then all the negatives are false negatives. In this situation, the injustice of that test is pressing, and also lose-lose. People from the out-group are missing jobs, and people from the organization are missing out on talent.
In day-to-day existence, we are not often in a world of such extremes. Typically the difference in the qualifications of each group will actually be somewhere in the middle. In that case, we probably wish to attempt to find out what the true base rates are so that we can attempt to calibrate those base rates. So, for example, if we wish to correct for an ethnic or gender bias, we should probably be calibrating against the rate at which people from some ethnic or gender out-group graduate from programs which qualify them with the appropriate skills, or something like that.
This is not to say that we should normalize only hiring from an in-group if they are over-represented. It is rather to say that if we are concerned about bias, and I think we should be, we owe it to ourselves to be precise about quantifying the bias, and working out the optimal way to address it.
If we use some formula based on the base rate fallacy, then we leave ourselves open to the criticism that we are not truly acting to remedy the equity. Rather we are making suboptimal decisions based on insufficient information at some cost to the organization.
- cshalizi tagged ‘to_teach:statistics_of_inequality_and_discrimination’
- The Base-Rate Neglect Cognitive Bias in Data Science
- Mitigating the Base-Rate Neglect Cognitive Bias in Data Science Education
- Reduce Noise, Not Cognitive Biases
- How To Reduce Decision Noise