Why You’re Not Data Driven — Part 2

Chad Hahn
8 min readJun 11, 2020

--

A Whole-Brain Look at the Challenges of Measurement and Motivation

Note: This is Part 2 in a four-part series on the challenges to being data-driven. To understand the purpose of this series and read from the beginning, Click Here.

In Part 1 of this series, I described how the challenges to being data-driven have less to do with choosing the right metrics and gathering the data and more to do with interpreting the results and using them to create the right motivations.

I also established a Whole-Brain inspired, 4-step process for data-driven decision-making — the what, the how, the why, and the who. I believe each of these steps gets successively harder, and in Part 1, I elaborated on the challenges with The What — deciding what to measure. In Part 2, let’s focus on the next step, deciding how best to capture the data needed for the metric.

The How — Finding a Method to the Madness

The most common problem with the “How” is the “garbage in, garbage out” problem, the notion that the accuracy of the metrics is only as good as the quality of the data. I don’t expect to have the same culinary experience at Jack in the Box that I have at my beloved In-and-Out, because the quality of the ingredients matters. The same applies when data are the ingredients.

Let’s take this a step further and enhance what we mean by “quality.” Data has quality when it is both precise and accurate. Charles Whelan does a great job of differentiating between precision and accuracy in “Naked Statistics.” Precision refers to “the exactitude with which we can express something,” while accuracy is a “measure of whether a figure is broadly consistent with the truth.” Whelan makes one thing clear: no amount of precision can make up for inaccuracy.¹ You need both when gathering data for your measurements, and there are three considerations to keep in mind:

Be Mindful of Sample Size

Let’s start by making sure we work with large sample sizes. For certain metrics, this is not a problem — we can measure the entire population. For example, if you want to measure the effectiveness of a digital ad campaign, you can look at the conversion rate of everyone who viewed the ad, not just a specific sample.

But for those metrics that require sampling, it must be a large, representative sample of the necessary population. If it isn’t, you will run into issues. Howard Wainer, Distinguished Research Scientist at the National Board of Medical Examiners, highlights this problem by describing what he calls the most dangerous equation, from famed Mathematician Abraham de Moivre. The equation states the variation of the mean is inversely proportional to the size of the mean. In other words, small samples display much larger variation (measured by standard deviation) than large samples.

Michael J. Mauboussin gives a great example to illustrate this problem in “The Success Equation.” Policymakers were looking at ways to improve schools, so they studied test scores based on school size. They found that smaller schools were overrepresented in their sample of schools that scored highest. They deduced that smaller schools provide better scores, so the public and private sector spent billions of dollars creating the infrastructure to support smaller schools. But there was one problem — smaller schools were also overrepresented in the schools that scored lowest. How is this possible? The different sample sizes from the population of schools were too small, leading to the variation issues in Moivre’s equation. Ironically, further research showed that after post-secondary education, students from larger schools scored better than those from smaller schools because they had the financial means and diversity to offer richer curriculums.²

Assumption Sensitivity

There are times when the metrics we care about involve assumptions that factor into our calculations. For example, anyone in sales understands the importance of a weighted pipeline, which is usually calculated using a statistical method called “equivalent value” — the sum of all the possible outcomes, each weighted by its probability and payoff. The weighted pipeline assumes a probably of closure in each sales stage.

When our metrics contain assumptions, we must consider their sensitivity to changes in those assumptions. Will small changes to assumed values lead to large changes in outcomes? If the answer is yes, be very careful because your metric outputs can lead down the wrong path. For an example, look no further than the Financial Crisis.

In “Naked Statistics,” Whelan describes a barometer of risk called the Value at Risk Model (VaR) that most Wall Street firms were using before the housing crisis. The appeal of this model, described by New York Times business writer Joe Nocera, was that it expressed the risk of an entire company in a single dollar figure. However, there was a flaw in the VaR models that was blamed for the onset and severity of the financial crisis — the probabilities of negative outcomes in the models were based on past market movements, particularly market movements of the past 30 years, a period when the market outperformed its historical average. The models offered 99% assurance, but they didn’t quantify how bad the 1% scenario would be, because that scenario was at the “tail end” of the distribution of risk outcomes. When that 1% event happened, Wall Street firms were not prepared.³

There’s another aspect to metric assumptions that should be considered — the concept of nonlinearity. In “Fooled by Randomness,” Nassim Nicholas Taleb defines nonlinearity as “a nonlinear effect resulting from a linear force exerted on an object.” Put simply, a small additional input can sometimes cause a disproportionate result. Taleb gives the example of building a sandcastle on a beach in Rio De Janeiro — you can keep adding small amounts of sand, but at some point, that small addition will result in an entire collapse of your fortress. My kids never seem to understand this concept.

If your metric assumptions are off by a small mark, the effect may not show up right away in the quality of your metric output, but eventually, you’ll hit a tipping point, much like the sandcastle example. According to Taleb, population models can lead to explosive growth or extinction depending on a very small difference in the assumed population at the start.⁴

Don’t be a Bad Person

When we’re defining metrics, we generally know what direction we want that metric to head. CEOs want their stock price to go up; website owners want to see the number and duration of visits to increase. It is quite natural to want our strategies and tactics to yield the results we desire. But be careful in how you frame your metric so that you don’t let your desire for an outcome to outweigh your desire for the truth.

This type of behavior can be seen in political polling, where the wording used in a question can have a measurable impact on the poll outcome. In “Naked Statistics,” Whelan talks about politicians using “words that work” in poll questions. For example, voters are more inclined to support “tax relief” over “tax cuts”, or “climate change” over “global warming”. According to Gallup, every year since 2002, over 60% of Americans said they favor the death penalty for a person convicted of murder, but that is only when life imprisonment without parole is not offered as an alternative. In a 2006 Gallup poll, support for the death penalty plummeted to 47% when life without parole was introduced as an option, with 48% supporting the alternative.⁵

I get it — you’re a businessperson, and we all know there are no politics in business, right? You don’t conduct polls — you just look at hard numbers. But our biases can affect how we capture other metrics, not only polls. According to Whelan in “Naked Statistics,” the Mutual Fund industry plays a trick on us called “Survivorship Bias”. New funds with returns below their representative market index are quietly closed, and those assets are folded into existing funds that have outperformed their market index, even if that outperformance was due to randomness. That way, the company can advertise that their flagship funds beat their market averages, even if that performance is the chance equivalent of flipping heads three times in a row.⁶

I have one more example from the book “Naked Statistics” on rigging a metric in favor of a desired outcome, but this one isn’t malicious in nature. It was quite ingenious, especially if you like beer. In 1981, Schlitz broadcasted a live test testing pitting their beer against Michelob during halftime of the Super Bowl. All joking aside about the merits of these two beers in a contest of quality, Schlitz’s act was bold. Not only were they planning on making the taste test live, but they also chose 100 Michelob drinkers as their subjects. Schlitz’s statement: even beer drinkers who prefer the other brand would pick them in a blind taste test.

For anyone old enough to have tasted Schlitz beer, this was the equivalent of bringing a Domino’s pizza to an authentic Italian cook-off. But the marketers behind the scheme knew exactly what they were doing. They worked with statisticians who knew that in Binomial trials, each outcome would have the same probabilities of success as long as the trials are independent. In other words, Schlitz knew that while their beer may be the Dominoes of libation, in a blind taste test between ANY two beers, on average, half will pick one beer and half the other.

The fact that Schlitz chose 100 Michelob drinkers for the live taste test was the genius part of the ploy — they calculated they had a 98% chance that at least 40 people would choose their beer in the live blind taste test. In the end, exactly 50% chose Schlitz! The metric was rigged from the beginning, but what Schlitz may have lacked in flavor, they more than made up in ingenuity.⁷

In light of the three considerations — sample size, assumptions and desires — we have to remember that how we gather our data is extremely important and often more difficult than choosing what to measure.

In Part 3 of this series, we’ll take a big leap in complexity. While choosing what to measure and how to measure it can be tough, choosing how to interpret our metrics can be an order of magnitude more challenging.

Bibliography

  1. Naked Statistics, Charles Whelan — Chapter 6
  2. The Success Equation, Michael J. Mauboussin — Chapter 1
  3. Naked Statistics, Charles Whelan — Chapter 6
  4. Fooled by Randomness, Nassim Nicholas Taleb — Chapter 10
  5. Naked Statistics, Charles Whelan — Chapter 10
  6. Naked Statistics, Charles Whelan — Chapter 7
  7. Naked Statistics, Charles Whelan — Chapter 5

--

--

Chad Hahn
Chad Hahn

Written by Chad Hahn

Husband, father of 3 boys, 2 time entrepreneur, tech enthusiast (esp blockchain), yellow Whole Brain thinker and supporter of under-served communities

No responses yet