Friday, July 22, 2022

When That Failed Replication Bling

 

One of my favorite uses of the Drake meme has disappeared from the internet, or at least is beyond my ability to find it. It went like this: next to the top image, in which Drake looks disgusted, it said "Polls are predictions." Next to the bottom image, in which Drake is endorsing, it said "Polls are snapshots."

It's a good way for people to reframe their thinking on opinion polls and to stop blaming Nate Silver for "being wrong." Polls just tell us how people generally felt at the time they were being asked the questions. By the time the poll is released, people may have already changed their mind. (It makes you wonder how many elections would be different if they had been delayed or moved ahead a week since people vote based on how they are feeling at the moment.)

I think about this meme when I think about the replication crisis. You know the story, some scientists tested subjects, made a big discovery, it gets cited thousands of times, someone tries to replicate the study and they find no correlation. The most famous example is probably the marshmallow test.

There is always a 20/20 hindsight perspective telling us about problems with the sample size, publication bias, or whatever. But sometimes there is nothing fundamentally wrong with the study.

Reading Robert Putnam's work has made me more aware of how much generations differ from one another. If people's values and mindsets are different depending on when they were born, is it too much to assume that those values will influence how they respond to their environment? And if this is possible, then maybe these studies that fail to replicate aren't bad, they're just a reflection of the subjects response at the time of the study.

In other words, maybe psychological experiments aren't explanations of universal human behavior. They are snapshots of human behavior at a particular point in time.

Beware Adoration of the Isolated Expert

In Noise—by Kahneman, Sibony, and Sunstein—the authors look at places where human judgment comes into play (sentencing, home appraisals, etc.) and identify how much statistical noise and bias exists, even among experts. For example, two different home appraisers might appraise the same home and differ by more than $100,000. For the same crime, one judge might give you probation and another might give you a ten-year sentence. That is noise.

They saw that one of the easiest things one can do to eliminate noise is to ask several experts and average their responses.

Knowing how wrong even an expert can be should downgrade our confidence in moving from an institutional to an individual society, as far as where we place our trust. For example, I really like Matt Yglesias’s substack (individual), but I hardly ever read Vox (institution) when he wrote there. I like and trust Tyler Cowen, his blog, and his podcast, but I don’t read Bloomberg editorials. Like many readers my age, I have more loyalty to individual writers than to media institutions. So knowing that even experts I trust will give opinions that are subject to bias and noise, I have to know that I can be led astray.

I began to envision what a system would look like that attempts to reduce noise by taking the expert opinion of individuals like Cowen and Yglesias and averages them out. Then I realized I was basically reinventing Metaculus and prediction markets. I used to think the future was leaning into individualism and choosing those who gave the most accurate forecasts as my thought leaders (read the last four paragraphs of my post on the disinformation funnel). But you’re actually better off just looking at the Metaculus or Superforecaster average. In this sense, we might see a return to trust in institutions.

I also realized how careful I have to be when reading someone like Emily Oster. I trust her so much that I am at risk of overweighting her judgments. For example, she recently wrote a blog post critiquing a paper that examined the effects of video games on kids’ IQ scores, which concluded that it had a positive effect (don’t tell my son this). She found the research lacking and listed all the reasons why. 

I enjoyed it. Like most people, I like reading a good debunking article, especially if it takes shots at people who have more social prestige than me (in this case, academics). But then I became reminded of Noise. The strength of this paper is really a judgment. Oster is making a judgment here. And I’m giving too much emphasis to a single expert’s judgment when I should be averaging it against other expert opinions.

But wait a second! Isn’t that exactly what peer review is for? When something passes peer review, doesn’t that mean a group of subject experts all had to agree this paper is worthy of publication? Oster’s analysis is just one more data point in a collection of opinions I should be seeking. Why does her opinion matter more than those who gave the peer review?

I read something on social media a while back that I’ll never forget: if you want people to like you, sound optimistic. If you want people to think you’re smart, sound pessimistic. Whenever someone criticizes something, especially if it’s something with some consensus among experts, there is a tendency to overweight that person's opinion. We tend to conflate pessimism with intelligence.

Just because someone is critical of consensus doesn’t mean they’re smart. As George Carlin once said, “Most people are completely full of shit and really good at hiding it.”

I still like Oster, but this changes the way I read blog posts like these. I like these types of analyses when the tone is “You may have seen the media or friends on Facebook linking to a study that purports to say X. You should not be worried because something something failed replication/small sample size/small p value, etc. ect.” 

Sometimes the paper itself says there isn’t sufficient evidence to draw any hard conclusions, which the media chooses to ignore. So pointing out this stuff is still useful to plebes like me. I just have to be more mindful that I’m not placing one person’s opinion above everyone else just because I like the way they think.