Expected value

My Fair Game – How To Use the Expected Value

You meet a nice man on the street offering you a game of dice. For a wager of just 2 $, you can win 8 $ when the dice shows a six. Sounds good? Let’s say you join in and play 30 rounds. What will be your expected balance after that?

You roll a six with the probability p = 1/6. So of the 30 rounds, you can expect to win 1/6 · 30 = 5, resulting in a pay-out of 40 $. But winning 5 rounds of course also means that you lost the remaining 25 rounds, resulting in a loss of 50 $. Your expected balance after 30 rounds is thus -10 $. Or in other words: for the player this game results in a loss of 1/3 $ per round.

 Let’s make a general formula for just this case. We are offered a game which we win with a probability of p. The pay-out in case of victory is P, the wager is W. We play this game for a number of n rounds.

The expected number of wins is p·n, so the total pay-out will be: p·n·P. The expected number of losses is (1-p)·n, so we will most likely lose this amount of money: (1-p)·n·W.

 Now we can set up the formula for the balance. We simply subtract the losses from the pay-out. But while we’re at it, let’s divide both sides by n to get the balance per round. It already includes all the information we need and requires one less variable.

B = p · P – (1-p) · W

This is what we can expect to win (or lose) per round. Let’s check it by using the above example. We had the winning chance p = 1/6, the pay-out P = 8 $ and the wager W = 2 $. So from the formula we get this balance per round:

B = 1/6 · 8 $ – 5/6 · 2 $ = – 1/3 $ per round

Just as we expected. Let’s try another example. I’ll offer you a dice game. If you roll two six in a row, you get P = 175 $. The wager is W = 5 $. Quite the deal, isn’t it? Let’s see. Rolling two six in a row occurs with a probability of p = 1/36. So the expected balance per round is:

B = 1/36 · 175 $ – 35/36 · 5 $ = 0 $ per round

I offered you a truly fair game. No one can be expected to lose in the long run. Of course if we only play a few rounds, somebody will win and somebody will lose.

It’s helpful to understand this balance as being sound for a large number of rounds but rather fragile in case of playing only a few rounds. Casinos are host to thousands of rounds per day and thus can predict their gains quite accurately from the balance per round. After a lot of rounds, all the random streaks and significant one-time events hardly impact the total balance anymore. The real balance will converge to the theoretical balance more and more as the number of rounds grows. This is mathematically proven by the Law of Large Numbers. Assuming finite variance, the proof can be done elegantly using Chebyshev’s Inequality.

The convergence can be easily demonstrated using a computer simulation. We will let the computer, equipped with random numbers, run our dice game for 2000 rounds. After each round the computer calculates the balance per round so far. The below picture shows the difference between the simulated balance per round and our theoretical result of – 1/3 $ per round.


(Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2)

Immigrants and Crime – A Statistical Analysis

Assume we are given a country with a population that is 90 % native and 10 % immigrant. As it is often the case in the first world, the native population is on average older than the immigrant population.

Let’s look at a certain type of crime, say robberies. Now a statistic shows that of all the robberies in the country, 80 % have been committed by natives and 20 % by immigrants. Can we conclude from these numbers that the immigrants are more inclined to steal than the natives? Many people would do so.

The police keeps basic records of all crimes that have been reported. This enables us to get a closer look at the situation. Consider the graph below, it shows the age distribution of people accused of robbery in Canada in 2008. It immediately becomes clear that it is for the most part a “young person’s crime”. The rates are significantly elevated for ages 14 – 20 and then decrease with age. Even without crunching the numbers it is clear that the younger a population is, the more robberies will occur.


Let’s go back to our fictional country of 90 % natives and 10 % immigrants, with the immigrant population being younger. Assuming the same inclination to committing robberies for both groups, the immigrant population would contribute more than 10 % to the total amount of robberies for the simple reason that robbery is a crime mainly committed by young people.

Using a simplistic example, we can put this logic to the test. Let’s stick to our numbers of 90 % natives and 10 % immigrants. This time however, we’ll crudely specify an age distribution for both. For the native population the breakdown is:

– 15 % below age 15

– 15 % between age 15 and 25

– 70 % above age 25

For the immigrants we take a slightly different distribution that results in a lower average age:

– 20 % below age 15

– 20 % between age 15 and 25

– 60 % above age 25

We’ll set the total population count to 100 million. Now assume that there’s a crime that is committed solely by people in the age group 15 to 25. Within this age group, 1 in 100000 will commit this crime over the course of one year, independently of what population group he or she belongs to. Note that this means that there’s no inclination towards this crime in any of the two groups.

It’s time to crunch the numbers. There are 0.9 · 100 million = 90 million natives. Of these, 0.15 · 90 million = 13.5 million are in the age group 15 to 25. This means we can expect 135 natives to commit this crime during a year.

As for the immigrants, there are 0.1 · 100 million = 10 million in the country, with 0.2 · 10 million = 2 million being in the age group of interest. They will give rise to an expected number of 20 crimes of this kind per year.

In total, we can expect this crime to be committed 155 times, with the immigrants having a share of 20 / 155 = 12.9 %. This is higher than their proportional share of 10 % despite there being no inclination for committing said crime. All that led to this result was the population being younger on average.

So concluding from a larger than proportional share of crime that there’s an inclination towards crime in this part of the population is not mathematically sound. To be able to draw any conclusions, we would need to know the expected value, which can be calculated from the age distribution of the crime and that of the population and can differ quite strongly from the proportional value.

(Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2)