Thursday, June 6, 2019

Predictions for small populations are useless

An article in Psychology today about gender representation in STEM had the following quote:
"Let's start with a simple fact: Most women do not have the right aptitude to be professors at top STEM departments. This is unfortunate, perhaps, but it’s true. It’s also true, though, that most men don’t have the right aptitude! Only a small minority of people do. The phenomenon we’re trying to explain is not why half the population (men) can do it whereas half the population (women) can’t. Most of the population can’t, and of the tiny fraction who can, some are men and some are women. The only question is: Why is the tiny fraction of men working in STEM fields today somewhat larger than the tiny fraction of women?"
I'm less concerned with where the author goes from there than I am with one specific part, the part where he points out that only a small minority of people work in STEM.

I wonder how much time we waste with forecasting and predictive models for something that has such a small sample size, that it's probably impossible to distinguish signal from noise.

Let's say the military wanted to identify US citizens at highest risk for joining ISIS. They collect all this demographic information of who has joined ISIS. They find that muslim men who immigrate from Aleppo, Syria, aged 17-28, with a college degree, who follow specific Twitter accounts are at highest risk to join.

So what happens when their spy program identifies a person who checks all those boxes? How likely are they to join ISIS? Still unlikely since MOST PEOPLE DON'T JOIN ISIS! If the base rate of people who join is 1%, all that statistical data does is move it to 3%.

No comments:

Post a Comment