# Two And A Half Fallacies (Statistics, Probability)

The field of statistics gives rise to a great number of fallacies (and intentional misuse for that matter). One of the most common is the Gambler’s Fallacy. It is the idea that an event can be “due” if it hasn’t appeared against all odds for quite some time.

In August 1913 an almost impossible string of events occurred in a casino in Monte Carlo. The roulette table showed black a record number of twenty-six times in a row. Since the chance for black on a single spin is about 0.474, the odds for this string are: 0.474^26 = 1 in about 270 million. For the casino, this was a lucky day. It profited greatly from players believing that once the table showed black several times in a row, the probability for another black to show up was impossibly slim. Red was due.

Unfortunately for the players, this logic failed. The chances for black remained at 0.474, no matter what colors appeared so far. Each spin is a complete reset of the game. The same goes for coins. No matter how many times a coin shows heads, the chance for this event will always stay 0.5. An unlikely string will not alter any probabilities if the events are truly independent.

Another common statistical fallacy is “correlation implies causation”. In countries with sound vaccination programmes, cancer rates are significantly elevated, whereas in countries where vaccination hardly takes place, there are only few people suffering from cancer. This seems to be a clear case against vaccination: it correlates with (and thus surely somehow must cause) cancer.

However, taking a third variable and additional knowledge about cancer into account produces a very different picture. Cancer is a disease of old age. Because it requires a string of undesired mutations to take place, it is usually not found in young people. It is thus clear that in countries with a higher life expectancy, you will find higher cancer rates. This increased life expectancy is reached via the many different tools of health care, vaccination being an important one of them. So vaccination leads to a higher life expectancy, which in turn leads to elevated rates in diseases of old age (among which is cancer). The real story behind the correlation turned out to be quite different from what could be expected at first.

Another interesting correlation was found by the parody religion FSM (Flying Spaghetti Monster). Deducting causation here would be madness. Over the 18th and 19th century, piracy, the one with the boats, not the one with the files and the sharing, slowly died out. At the same time, possibly within a natural trend and / or for reasons of increased industrial activity, the global temperature started increasing. If you plot the number of pirates and the global temperature in a coordinate system, you find a relatively strong correlation between the two. The more pirates there are, the colder the planet is. Here’s the corresponding formula:

T = 16 – 0.05 · P^0.33

with T being the average global temperature and P the number of pirates. Given enough pirates (about 3.3 million to be specific), we could even freeze Earth. But of course nobody in the right mind would see causality at work here, rather we have two processes, the disappearance of piracy and global warming, that happened to occur at the same time. So you shouldn’t be too surprised that the recent rise of piracy in Somalia didn’t do anything to stop global warming.

As we saw, a correlation between quantities can arise in many ways and does not always imply causation. Sometimes there is a third, unseen variable in the line of causation, other times it’s two completely independent processes happening at the same time. So be careful to draw your conclusions.

Though not a fallacy in the strict sense, combinations of low probability and a high number of trials are also a common cause for incorrect conclusions. We computed that in roulette the odds of showing black twenty-six times in a row are only 1 in 270 million. We might conclude that it is basically impossible for this to happen anywhere.

But considering there are something in the order of 3500 casinos worldwide, each playing roughly 100 rounds of roulette per day, we get about 130 million rounds per year. With this large number of trials, it would be foolish not to expect a 1 in 270 million event to occur every now and then. So when faced with a low probability for an event, always take a look at the number of trials. Maybe it’s not as unlikely to happen as suggested by the odds.

# Statistics: The Multiplication Rule Gently Explained

Multiplication is a surprisingly powerful tool in statistics. It enables us to solve a vast amount of problems with relative ease. One thing to remember though is that the multiplication rule, to which I’ll get in a bit, only works for independent events. So let’s talk about those first.

When we roll a dice, there’s a certain probability that the number six will show. This probability does not depend on what number we rolled before. The events “rolling a three” and “rolling a six” are independent in the sense, that the occurrence of the one event does not affect the probability for the other.

Let’s look at a card deck. We draw a card and note it. Afterward, we put it back in the deck and mix the cards. Then we draw another one. Does the event “draw an ace” in the first try affect the event “draw a king” in the second try? It does not, because we put the ace back in the deck and mixed the cards. We basically reset our experiment. In such a case, the events “draw an ace” and “draw a king” are independent.

But what if we don’t put the first card back in the deck? Well, when we take the ace out of the deck, the chance of drawing a king will increase from 4 / 52 (4 kings out of 52 cards) to 4 / 51 (4 kings out of 51 cards). If we don’t do the reset, the events “draw an ace” and “draw a king” are in fact dependent. The occurrence of one changes the probability for the other.

With this in mind, we can turn to our powerful tool called multiplication rule. We start with two independent events, A and B. The probabilities for their occurrence are respectively p(A) and p(B). The multiplication rule states that the probability of both events occurring is simply the product of the probabilities p(A) and p(B). In mathematical terms:

p(A and B) = p(A) · p(B).

A quick look at the dice will make this clear. Let’s take both A and B to be the event “rolling a six”. Obviously they are independent, rolling a six on one try will not change the probability of rolling a six in the following try. So we are allowed to use the multiplication rule here. The probability of rolling a six is 1/6, so p(A) = p(B) = 1/6. Using the multiplication rule, we can calculate the chance of rolling two six in a row: p(A and B) = 1/6 · 1/6 = 1/36. Note that if we took A to be “rolling a six” and B to be “rolling a three”, we would arrive at the same result. The chance of rolling two six in a row is the same as rolling a six and then a three.

Can we also use this on the deck of cards, even if we don’t reset the experiment? Indeed we can. But we have to take into account that the probabilities change as we go along. In more abstract terms, instead of looking at the general events “draw an ace” and “draw a king”, we need to look at the events A = “draw an ace in the first try” and B = “draw a king with one ace missing”. With the order of the events clearly set, there’s no chance of them interfering. The occurrence of both events, first drawing an ace and then drawing a king with the ace missing, has the probability: p(A and B) = p(A) · p(B) = 4/52 · 4/51 = 16/2652 or 1 in about 165 or 0.6 %.

For examples on how to apply the multiplication rule check out Multiple Choice Tests and Monkeys on Typewriters.