I want to do something a little different today and talk about data.
I know, not the sexiest topic in the world. But it’s an important one.
I was in a unique position while working for a social services agency, because I didn’t start in that field. I majored in biology while an undergrad, and after college I worked in the natural sciences for a number of years. I worked on a number of studies, and even published a paper called “Variation in resource limitation of plant reproduction influences natural selection on floral traits of Asclepias syriaca”, which is very fancy language for the time I spent an entire summer measuring flowers under a microscope.
One of my required courses as a science major was statistics. It was an interesting class, but what was most interesting was how it made me realize that statistics are incredibly easy to abuse. People are often impressed by them, because, well, they look impressive. But the truth is there are huge limitations and how statistics are used can have dangerous repercussions. Data can be used to oppress and abuse. It can be used to reinforce the status quo. It can be used to outright lie. And that’s why we need to understand it.
Coming from a science background meant that I had some pretty horrifying moments in joining social services, when I realized how data was talked about and used in making important decisions. Because it was bad. Really bad.
When I was recruited to join the training team for a local leadership program, I knew I was going to have to talk about data. In order to graduate, each participant had to design, implement, and complete a project. (To the best of their ability – we did give leeway for existing in a giant bureaucracy that could crush a months long project in minutes flat). And the last thing I wanted was for them to continue following the agency’s lead when it came to the use of data in project planning and implementation.
My co-trainers were kind and encouraged me in my data needs. Logically, I knew part of a one day session was never going to be enough to change the behaviors of a whole agency, but I had to try.
And as we hear a lot of the political discourse that is happening in the news, I feel like I need to talk about it again. Because there is a lot of bad data out there.
Item 1 – who benefits from the data?
In the 1990s, pharmaceutical company Merck was developing an arthritis drug called Vioxx. They wanted FDA approval, because approval means money. So they engaged in a number of unethical practices to fudge their results in the clinical trials. The worst part is that they were not just hiding unpleasant side effects, but actual deadly ones. The end result? In 2006, estimates stood at 88,000 Americans having heart attacks from taking the drug, with 38,000 of those events being fatal. The drug was pulled, but the health impacts lingered.
More recently, Boeing has made the news for their 737 Max plane being involved in two crashes. Although there are still ongoing investigations, there is some evidence that the Federal Aviation Administration allowed Boeing to choose their own personnel to conduct safety studies, allowing the manufacturer to hold most of the power in approving their own aircraft. And if your job depends on you finding an aircraft safe enough to go to market in time for an important deadline? It’s going to get approved as safe.
It’s important to understand that this happens a lot in studies. Some of it is deliberate. Some of it is accidental. Some of it is due to unconscious biases. But you have to ask the questions, any time you see a study. Who paid for it? And who benefits?
Item 2 – correlation is not causation
People really struggle with this one. And it can be confusing.
Conveniently, there’s always plenty of examples of how this one works. Pretty much any time you pick up a paper, you’ll see some form of this.
Recently, an article was published about a study that found a correlation between men’s cardiovascular health and how many pushups they could do. Simple enough, right? And most follow-up articles you find about it have headlines like this one from USA Today that say “Men who can do 40 push-ups have a lower risk of heart disease”.
Then there’s another type of headline, like this one from the Good News Network: “New Harvard Study Says That Men Can Avoid Heart Problems By Doing a Certain Number of Push-ups”.
Do you see it?
In the first article, they are reporting a correlation. Men who can do a high number of push-ups also have a lower risk of heart disease.
In the second article, they are reporting a causation. Go do push-ups, and you will lower your risk of heart problems.
Both of these are top articles on google. Both are reporting the same study. Both use the same data. And one is drawing the exact wrong conclusions.
When I was writing on empathy previously, I looked at a number of videos on Youtube. And I watched a particular one that talked about the science of increasing empathy. It’s a well intentioned piece, but there’s a flaw. At the end, they have an actor pretending to be homeless, and they watch as their study participants donate money. The participants who watched a video with a personal story about homelessness donated more money on average than the participants who watched a video with only statistics. So in the experiment, they confidently conclude that the personal video caused the participants to donate more.
It’s possible that this is the case. But again, we don’t have enough data to know for sure. There definitely seems to be a correlation. But a correlation is not causation. Much more data is needed, with a much bigger group of participants, before you can say something didn’t happen by chance. Maybe the participants were influenced by the video they watched. And maybe the designer of the study, subconsciously wanting a specific result, happened to sort the people in specific ways. Or maybe the people were just coincidentally sorted in a way where people who tend to donate more were in one particular group.
And this is the problem with much of pop culture science. It’s meant to make an impact, but it’s limited. This is why studies need to be repeated, with different participants and different scientists.
So if you see a really exciting headline, just remember to ask yourself. Did they prove causation? Or are they jumping to conclusions?
Item 3 – getting only part of the picture
There’s a British magician named Derren Brown who once filmed himself flipping a coin and getting ten heads in a row. Something very statistically improbable, and yet he made it happen in under a minute. Magic!
Only, it wasn’t. Because he was only showing the last minute of what actually happened. And what actually happened was that he filmed himself flipping a coin for over nine hours, until he got the results he wanted.
One of the most popular “health indicators” in our society is the use of the BMI (Body Mass Index). For many years, the BMI has been used to provide a part of the picture when it comes to a person’s health. But it’s not a complete picture.
Did you know where it comes from? The original formula was developed by a Belgian mathematician named Adolphe Quetelet, back in 1835, in an effort to define a “normal” man. So almost 200 years ago, this guy crunched some numbers. And that’s fine, that’s what mathematicians do.
Then, in 1972, a researcher named Ancel Keys modified the formula, when he studied 7,400 men.
Sit with that one for just a moment…
Now, after years of major health organizations promoting BMI numbers as something to aspire to, more recent studies indicate that the BMI may not be the most accurate indicator of health, including for the following groups: Asian people, athletes, women who may be pregnant or nursing, nonpregnant women, and people over 65.
Now, maybe it’s just me, but I think that if you take all women who are pregnant or nursing, and all women who are nonpregnant, than you actually end up with…let me calculate here… all women?
And this isn’t even delving into into racial biases when it comes to health studies and data.
In fact, Keys himself didn’t think the BMI should be a diagnostic tool, as there are so many variants in health for each individual. It was intended to show an average for a population, not an aspirational goal for an individual.
Caroline Criado Perez recently released an entire book on the way science has excluded women from studies, and focused almost exclusively on men. Everything from seatbelts to medications can be more dangerous for women, because of this bias. You can read an extract here.
Excluding half of the world’s population is not good science. And misusing things like the BMI or designing safety measures based on men’s measurements can cause real damage to real people.
So it’s important to ask. Who’s not being included here? What data might I be missing?
Item 4 – misdirection
This one isn’t about who funded or initiated a study. This is about people taking numbers and misusing the data to prove their own conclusions.
One recent example is the movie Captain Marvel. Heading up to the release date, a number of people online, mostly men, were deeply critical of the female-led movie and the lead actress, Brie Larson. These men kept talking about what a failure the movie would be, and they would do everything they could to present data that supported their position.
After the movie’s opening weekend, the box office on Monday showed a drop of over 70%. Immediately the critics jumped on this number, writing that it proved that the movie would be a flop.
The problem? That kind of dropoff is completely normal for big blockbusters. More people go to the movies on the weekend than on Mondays. It’s a number that only seems shocking if you don’t know any of the context.
This is a strategy you’ll see a lot when it comes to political discussion. And one of the most common ways to misdirect people about data is to use a graph.
I won’t go through every way that graphs are poorly used, although I do highly recommend reading this fantastic breakdown by Ryan McCready.
Some graphs are bad through sheer incompetence, but sadly, a large number of them are manipulated on purpose. Fox News is one example of an organization that consistently misuses public data to draw faulty conclusions. They’ve played with the axes on their charts to make changes over time seem more significant, double-counted data to improve the numbers that matter most to them, and my favorite, made a pie chart with numbers that came to a total of 193%. (For those of you unfamiliar, pie charts go to 100%. You can’t eat 193% of a pie).
This is why it’s important to look critically at any data that is presented to you. You should always be able to go back to the original source and find a match in what is being presented. If you don’t, you’re being mislead.
So who’s presenting this data? And what do they have to gain?
We live in a world where we are inundated with bad information. Organizations are run by people and people have agendas. Being able to think critically and question our sources is vital to making good decisions. This is particularly true if you are in a position of leadership. Because you’re not just making decisions for yourself. You’re impacting employees, co-workers, customers, and clients.
And I get it. I’ve been in management. I know how little time and money there is to think about data.
But the cost is far greater if we don’t.