Two And A Half Fallacies (Statistics, Probability)

The field of statistics gives rise to a great number of fallacies (and intentional misuse for that matter). One of the most common is the Gambler’s Fallacy. It is the idea that an event can be “due” if it hasn’t appeared against all odds for quite some time.

In August 1913 an almost impossible string of events occurred in a casino in Monte Carlo. The roulette table showed black a record number of twenty-six times in a row. Since the chance for black on a single spin is about 0.474, the odds for this string are: 0.474^26 = 1 in about 270 million. For the casino, this was a lucky day. It profited greatly from players believing that once the table showed black several times in a row, the probability for another black to show up was impossibly slim. Red was due.

Unfortunately for the players, this logic failed. The chances for black remained at 0.474, no matter what colors appeared so far. Each spin is a complete reset of the game. The same goes for coins. No matter how many times a coin shows heads, the chance for this event will always stay 0.5. An unlikely string will not alter any probabilities if the events are truly independent.

Another common statistical fallacy is “correlation implies causation”. In countries with sound vaccination programmes, cancer rates are significantly elevated, whereas in countries where vaccination hardly takes place, there are only few people suffering from cancer. This seems to be a clear case against vaccination: it correlates with (and thus surely somehow must cause) cancer.

However, taking a third variable and additional knowledge about cancer into account produces a very different picture. Cancer is a disease of old age. Because it requires a string of undesired mutations to take place, it is usually not found in young people. It is thus clear that in countries with a higher life expectancy, you will find higher cancer rates. This increased life expectancy is reached via the many different tools of health care, vaccination being an important one of them. So vaccination leads to a higher life expectancy, which in turn leads to elevated rates in diseases of old age (among which is cancer). The real story behind the correlation turned out to be quite different from what could be expected at first.

Another interesting correlation was found by the parody religion FSM (Flying Spaghetti Monster). Deducting causation here would be madness. Over the 18th and 19th century, piracy, the one with the boats, not the one with the files and the sharing, slowly died out. At the same time, possibly within a natural trend and / or for reasons of increased industrial activity, the global temperature started increasing. If you plot the number of pirates and the global temperature in a coordinate system, you find a relatively strong correlation between the two. The more pirates there are, the colder the planet is. Here’s the corresponding formula:

T = 16 – 0.05 · P^0.33

with T being the average global temperature and P the number of pirates. Given enough pirates (about 3.3 million to be specific), we could even freeze Earth.

pirates global warming correlation flying spaghetti

But of course nobody in the right mind would see causality at work here, rather we have two processes, the disappearance of piracy and global warming, that happened to occur at the same time. So you shouldn’t be too surprised that the recent rise of piracy in Somalia didn’t do anything to stop global warming.

As we saw, a correlation between quantities can arise in many ways and does not always imply causation. Sometimes there is a third, unseen variable in the line of causation, other times it’s two completely independent processes happening at the same time. So be careful to draw your conclusions.

Though not a fallacy in the strict sense, combinations of low probability and a high number of trials are also a common cause for incorrect conclusions. We computed that in roulette the odds of showing black twenty-six times in a row are only 1 in 270 million. We might conclude that it is basically impossible for this to happen anywhere.

But considering there are something in the order of 3500 casinos worldwide, each playing roughly 100 rounds of roulette per day, we get about 130 million rounds per year. With this large number of trials, it would be foolish not to expect a 1 in 270 million event to occur every now and then. So when faced with a low probability for an event, always take a look at the number of trials. Maybe it’s not as unlikely to happen as suggested by the odds.

My Fair Game – How To Use the Expected Value

You meet a nice man on the street offering you a game of dice. For a wager of just 2 $, you can win 8 $ when the dice shows a six. Sounds good? Let’s say you join in and play 30 rounds. What will be your expected balance after that?

You roll a six with the probability p = 1/6. So of the 30 rounds, you can expect to win 1/6 · 30 = 5, resulting in a pay-out of 40 $. But winning 5 rounds of course also means that you lost the remaining 25 rounds, resulting in a loss of 50 $. Your expected balance after 30 rounds is thus -10 $. Or in other words: for the player this game results in a loss of 1/3 $ per round.

 Let’s make a general formula for just this case. We are offered a game which we win with a probability of p. The pay-out in case of victory is P, the wager is W. We play this game for a number of n rounds.

The expected number of wins is p·n, so the total pay-out will be: p·n·P. The expected number of losses is (1-p)·n, so we will most likely lose this amount of money: (1-p)·n·W.

 Now we can set up the formula for the balance. We simply subtract the losses from the pay-out. But while we’re at it, let’s divide both sides by n to get the balance per round. It already includes all the information we need and requires one less variable.

B = p · P – (1-p) · W

This is what we can expect to win (or lose) per round. Let’s check it by using the above example. We had the winning chance p = 1/6, the pay-out P = 8 $ and the wager W = 2 $. So from the formula we get this balance per round:

B = 1/6 · 8 $ – 5/6 · 2 $ = – 1/3 $ per round

Just as we expected. Let’s try another example. I’ll offer you a dice game. If you roll two six in a row, you get P = 175 $. The wager is W = 5 $. Quite the deal, isn’t it? Let’s see. Rolling two six in a row occurs with a probability of p = 1/36. So the expected balance per round is:

B = 1/36 · 175 $ – 35/36 · 5 $ = 0 $ per round

I offered you a truly fair game. No one can be expected to lose in the long run. Of course if we only play a few rounds, somebody will win and somebody will lose.

It’s helpful to understand this balance as being sound for a large number of rounds but rather fragile in case of playing only a few rounds. Casinos are host to thousands of rounds per day and thus can predict their gains quite accurately from the balance per round. After a lot of rounds, all the random streaks and significant one-time events hardly impact the total balance anymore. The real balance will converge to the theoretical balance more and more as the number of rounds grows. This is mathematically proven by the Law of Large Numbers. Assuming finite variance, the proof can be done elegantly using Chebyshev’s Inequality.

The convergence can be easily demonstrated using a computer simulation. We will let the computer, equipped with random numbers, run our dice game for 2000 rounds. After each round the computer calculates the balance per round so far. The below picture shows the difference between the simulated balance per round and our theoretical result of – 1/3 $ per round.


(Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: