Math

More Pirates, Less Global Warming … wait, what?

An interesting correlation was found by the parody religion FSM (Flying Spaghetti Monster). Deducting causation here would be madness. Over the 18th and 19th century, piracy, the one with the boats, not the one with the files and the sharing, slowly died out. At the same time, possibly within a natural trend and / or for reasons of increased industrial activity, the global temperature started increasing. If you plot the number of pirates and the global temperature in a coordinate system, you find a relatively strong correlation between the two. The more pirates there are, the colder the planet is. Here’s the corresponding formula and graph:

 T = 16 – 0.05 · P0.33

 Statistical Snacks_html_40452f20

with T being the average global temperature and P the number of pirates. Given enough pirates (about 3.3 million to be specific), we could even freeze Earth. But of course nobody in the right mind would see causality at work here, rather we have two processes, the disappearance of piracy and global warming, that happened to occur at the same time. So you shouldn’t be too surprised that the recent rise of piracy in Somalia didn’t do anything to stop global warming.

A tunnel through earth and a surprising result …

Recently I found an interesting problem: A straight tunnel is being drilled through the earth (see picture; tunnel is drawn with two lines) and rails are installed in the tunnel. A train travels, only driven by gravitation and frictionless, along the rails. How long does it take the train to travel through this earth tunnel of length l?

The calculation, shows a surprising result. The travel time is independent of the length l; the time it takes the train to travel through a 1 Km tunnel is the same as through a 5000 Km tunnel, about 2500 seconds or 42 minutes! Why is that?

Imagine a model train on rails. If you put the rails on flat ground, the train won’t move. The gravitational force is pulling on the train, but not in the direction of travel. If you incline the rails slighty, the train starts to move slowly, if you incline the rails strongly, it rapidly picks up speed.

Now lets imagine a tunnel through the earth! A 1 Km tunnel will only have a slight inclination and the train would accelerate slowly. It would be a pleasant trip for the entire family. But a 5000 Km train would go steeply into the ground, the train would accelerate with an amazing rate. It would be a hell of a ride! This explains how we always get the same travel time: the 1 Km tunnel is short and the velocity would remain low, the 5000 Km is long, but the velocity would become enormous.

Here is how the hell ride through the 5000 Km tunnel looks in detail:

The red, monotonous increasing curve, shows distance traveled (in Km) versus time (in seconds), the blue curve shows velocity (in Km/s) versus time. In the center of the tunnel the train reaches the maximum velocity of about 3 Km/s, which corresponds to an incredible 6700 mi/h!

Typical Per-Page-Prices for Ebooks

I did a little analysis of ebook prices per 100 pages for different categories in the Amazon Kindle store. In each category I looked at the top 12 paid books. This data can help readers to judge prices and authors to set them. Here are the results in increasing order:

Erotica: 1.7 $ per 100 pages (ranging from 1.0 – 3.1 $ per 100 pages)
Sci-Fi and Fantasy: 1.8 $ per 100 pages (ranging from 0.8 – 4.4 $ per 100 pages)
Short Stories: 2.0 $ per 100 pages (ranging from 0.5 – 4.2 $ per 100 pages)
Self-Help: 3.6 $ per 100 pages (ranging from 1.3 – 6.7 $ per 100 pages)
Applied Math: 4.0 $ per 100 pages (ranging from 0.9 – 7.9 $ per 100 pages)
Economy / Business: 7.2 $ per 100 pages (ranging from 3.3 – 17.2 $ per 100 pages)

Typical (and in my opinion fair) prices seem to be 2 $ per 100 pages for fiction and 4 $ per 100 pages for non-fiction. In the special case of business books, prices of 7 $ per 100 pages seem common.

Average Size of Web Pages plus Prediction

Using data from websiteoptimization.com I plotted the development of web page sizes over the years. I also included the exponential fit:
Image

As you can see, the 1/2 MB mark was cracked in 2009 and the 1 MB mark was cracked in 2012. Despite the seemingly random fluctuations, an exponential trend is clearly visible. The power 0.3 indicates that the web page sizes doubles about every 2.3 years. Assuming this exponential trend continues we will have these average sizes in the coming years:

2013 – ca. 1600 kb
2014 – ca. 2100 kb
2015 – ca. 2900 kb

So the 2 MB will probably be cracked in 2014 and in 2015 we will already be close to the 3 MB mark. Of course the trend is bound to flat out, but at this point there’s no telling when it will happen.

If you like more Internet analysis, check out The Internet since 1998 in Numbers.

Missile Accuracy (CEP) – Excerpt from “Statistical Snacks”

An important quantity when comparing missiles is the CEP (Circular Error Probable). It is defined as the radius of the circle in which 50 % of the fired missiles land. The smaller it is, the better the accuracy of the missile. The German V2 rockets for example had a CEP of about 17 km. So there was a 50/50 chance of a V2 landing within 17 km of its target. Targeting smaller cities or even complexes was next to impossible with this accuracy, one could only aim for a general area in which it would land rather randomly.

Today’s missiles are significantly more accurate. The latest version of China’s DF-21 has a CEP about 40 m, allowing the accurate targeting of small complexes or large buildings, while CEP of the American made Hellfire is as low as 4 m, enabling precision strikes on small buildings or even tanks.

Assuming the impacts are normally distributed, one can derive a formula for the probability of striking a circular target of Radius R using a missile with a given CEP:

p = 1 – exp( -0.41 · R² / CEP² )

This quantity is also called the “single shot kill probability” (SSKP). Let’s include some numerical values. Assume a small complex with the dimensions 100 m by 100 m is targeted with a missile having a CEP of 150 m. Converting the rectangular area into a circle of equal area gives us a radius of about 56 m. Thus the SSKP is:

p = 1 – exp( -0.41 · 56² / 150² ) = 0.056 = 5.6 %

So the chances of hitting the target are relatively low. But the lack in accuracy can be compensated by firing several missiles in succession. What is the chance of at least one missile hitting the target if ten missiles are fired? First we look at the odds of all missiles missing the target and answer the question from that. One missile misses with 0.944 probability, the chance of having this event occur ten times in a row is:

p(all miss) = 0.94410 = 0.562

Thus the chance of at least one hit is:

p(at least one hit) = 1 – 0.562 = 0.438 = 43.8 %

Still not great considering that a single missile easily costs 10000 $ upwards. How many missiles of this kind must be fired at the complex to have a 90 % chance at a hit? A 90 % chance at a hit means that the chance of all missiles missing is 10 %. So we can turn the above formula for p(all miss) into an equation by inserting p(all miss) = 0.1 and leaving the number of missiles n undetermined:

0.1 = 0.944n

All that’s left is doing the algebra. Applying the natural logarithm to both sides and solving for n results in:

n = ln(0.1) / ln(0.944) = 40

So forty missiles with a CEP of 150 m are required to have a 90 % chance at hitting the complex. As you can verify by doing the appropriate calculations, three DF-21 missiles would have achieved the same result.

Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2. For more excerpts see The Probability of Becoming a Homicide Victim and How To Use the Expected Value.

Smoking – Your (My) Chances of Dying Early from it

I admit that I smoke. And my first attempt to quit after 13 years of a pack a day only lasted one month. Here’s what convinced me to try:

  • 50 % of smokers will die early due to their habit (Source: WHO)
  • On average smokers die 10 years earlier (Source: CDC)
  • Every year about 6 million people die from smoking related diseases, that is more than one Jumbo Jet full of people every hour (Source: WHO)

Most sensible people wouldn’t play Russian Roulette, but some take even higher chances at early death with smoking.

http://www.who.int/mediacentre/factsheets/fs339/en/

http://www.cdc.gov/tobacco/data_statistics/fact_sheets/health_effects/tobacco_related_mortality/

 

The good news: If you start smoking in your teens, and quit at …

  • … the age of 30, you get all of the 10 years back; the damage done is almost completely reversible
  • … the age of 40, you get 9 of the 10 years back; the damage done is reversible for the most part
  • … the age of 50, you get 6 of the 10 years back; some of the damages are still reversible
  • … the age of 60, you get 3 of the 10 years back; most damages will remain, but life quality will improve

Since I just hit 30, I’ll be sure to give it another try once my vacation is over. Having too much time is a very bad idea if you want to quit, better do it when you’re busy.

http://www.netdoctor.co.uk/healthy-living/lung-cancer.htm

http://www.rauchfrei.de/raucherstatistik.htm

Dota 2 Statistics

I analyzed 24 Dota Games with and a total of 209 players and lengths ranging from 6 to 39 minutes to deduce the distribution of Gold per Minute (GPM) and Experience per Minute (XPM) players manage to achieve in the game. The data is taken from dotabuff.com and processed using Origin Pro.

 

Gold per Minute:

  • Average: 270 GPM
  • Ranged between 75 and 712 GPM
  • 5 % (1 in 20) cracked 500 GPM mark

 

GPM

 

Experience per Minute:

  • Average: 311 XPM
  • Ranged between 9 and 712 XPM
  • 4 % (1 in 25) cracked 600 XPM mark

XPM

Guns per Capita and Homicides – Is There a Correlation?

Here’s a statistics quicky. A while ago, just after the tragic shooting at Sandy Hook Elementary School, I wanted to produce a clear proof that gun ownership and homicide rates are correlated. It seemed logical to me that, plus / minus statistical fluctuations, the phrase “more guns, more violence” holds true. So I extracted the relevant data for all first world countries from Wikipedia and did the plot. Here’s the picture I got:

Graph2

Maybe you are as surprised as I was. Obviously, there’s no relationship between the two variables, more guns does not mean more violence and less guns does not mean less violence. So whatever the main cause for the violence problem in the US (see the isolated dot in the top right? That’s the US), it can’t be guns. And that’s a liberal European speaking …

Just in case anyone cares, I blame the gang and hip-hop culture. I can’t be guns (see above), but it also can’t be media or mental health or drugs (people in all other first world countries also play shooter games,  watch violent movies, have mental problems, buy and sell drugs).

Sources:

http://en.wikipedia.org/wiki/Number_of_guns_per_capita_by_country

http://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate

My Fair Game – How To Use the Expected Value

You meet a nice man on the street offering you a game of dice. For a wager of just 2 $, you can win 8 $ when the dice shows a six. Sounds good? Let’s say you join in and play 30 rounds. What will be your expected balance after that?

You roll a six with the probability p = 1/6. So of the 30 rounds, you can expect to win 1/6 · 30 = 5, resulting in a pay-out of 40 $. But winning 5 rounds of course also means that you lost the remaining 25 rounds, resulting in a loss of 50 $. Your expected balance after 30 rounds is thus -10 $. Or in other words: for the player this game results in a loss of 1/3 $ per round.

 Let’s make a general formula for just this case. We are offered a game which we win with a probability of p. The pay-out in case of victory is P, the wager is W. We play this game for a number of n rounds.

The expected number of wins is p·n, so the total pay-out will be: p·n·P. The expected number of losses is (1-p)·n, so we will most likely lose this amount of money: (1-p)·n·W.

 Now we can set up the formula for the balance. We simply subtract the losses from the pay-out. But while we’re at it, let’s divide both sides by n to get the balance per round. It already includes all the information we need and requires one less variable.

B = p · P – (1-p) · W

This is what we can expect to win (or lose) per round. Let’s check it by using the above example. We had the winning chance p = 1/6, the pay-out P = 8 $ and the wager W = 2 $. So from the formula we get this balance per round:

B = 1/6 · 8 $ – 5/6 · 2 $ = – 1/3 $ per round

Just as we expected. Let’s try another example. I’ll offer you a dice game. If you roll two six in a row, you get P = 175 $. The wager is W = 5 $. Quite the deal, isn’t it? Let’s see. Rolling two six in a row occurs with a probability of p = 1/36. So the expected balance per round is:

B = 1/36 · 175 $ – 35/36 · 5 $ = 0 $ per round

I offered you a truly fair game. No one can be expected to lose in the long run. Of course if we only play a few rounds, somebody will win and somebody will lose.

It’s helpful to understand this balance as being sound for a large number of rounds but rather fragile in case of playing only a few rounds. Casinos are host to thousands of rounds per day and thus can predict their gains quite accurately from the balance per round. After a lot of rounds, all the random streaks and significant one-time events hardly impact the total balance anymore. The real balance will converge to the theoretical balance more and more as the number of rounds grows. This is mathematically proven by the Law of Large Numbers. Assuming finite variance, the proof can be done elegantly using Chebyshev’s Inequality.

The convergence can be easily demonstrated using a computer simulation. We will let the computer, equipped with random numbers, run our dice game for 2000 rounds. After each round the computer calculates the balance per round so far. The below picture shows the difference between the simulated balance per round and our theoretical result of – 1/3 $ per round.

Image

(Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2)

The Probability of Becoming a Homicide Victim

 Each year in the US there are about 5 homicides per 100000 people, so the probability of falling victim to a homicide in a given year is 0.00005 or 1 in 20000. What are the chances of falling victim to a homicide over a lifespan of 70 years?

 Let’s approach this the other way around. The chance of not becoming a homicide victim during one year is p = 0.99995. Using the multiplication rule we can calculate the probability of this event occurring 70 times in a row:

 p = 0.99995 · … · 0.99995 = 0.9999570

 Thus the odds of not becoming a homicide victim over the course of 70 years are 0.9965. This of course also means that there’s a 1 – 0.9965 = 0.0035, or 1 in 285, chance of falling victim to a homicide during a life span. In other words: two victims in every jumbo jet full of people. How does this compare to other countries?

 In Germany, the homicide rate is about 0.8 per 100000 people. Doing the same calculation gives us a 1 in 1800 chance of becoming a murder victim, so statistically speaking there’s one victim per small city. At the other end of the scale is Honduras with 92 homicides per 100000 people, which translates into a saddening 1 in 16 chance of becoming a homicide victim over the course of a life and is basically one victim in every family.

 It can get even worse if you live in a particularly crime ridden part of a country. The homicide rate for the city San Pedro Sula in Honduras is about 160 per 100000 people. If this remained constant over time and you never left the city, you’d have a 1 in 9 chance of having your life cut short in a homicide.

Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2. For more excerpts check out Missile Accuracy (CEP), Immigrants and Crime and Monkeys on Typewriters.

Immigrants and Crime – A Statistical Analysis

Assume we are given a country with a population that is 90 % native and 10 % immigrant. As it is often the case in the first world, the native population is on average older than the immigrant population.

Let’s look at a certain type of crime, say robberies. Now a statistic shows that of all the robberies in the country, 80 % have been committed by natives and 20 % by immigrants. Can we conclude from these numbers that the immigrants are more inclined to steal than the natives? Many people would do so.

The police keeps basic records of all crimes that have been reported. This enables us to get a closer look at the situation. Consider the graph below, it shows the age distribution of people accused of robbery in Canada in 2008. It immediately becomes clear that it is for the most part a “young person’s crime”. The rates are significantly elevated for ages 14 – 20 and then decrease with age. Even without crunching the numbers it is clear that the younger a population is, the more robberies will occur.

Image

Let’s go back to our fictional country of 90 % natives and 10 % immigrants, with the immigrant population being younger. Assuming the same inclination to committing robberies for both groups, the immigrant population would contribute more than 10 % to the total amount of robberies for the simple reason that robbery is a crime mainly committed by young people.

Using a simplistic example, we can put this logic to the test. Let’s stick to our numbers of 90 % natives and 10 % immigrants. This time however, we’ll crudely specify an age distribution for both. For the native population the breakdown is:

– 15 % below age 15

– 15 % between age 15 and 25

– 70 % above age 25

For the immigrants we take a slightly different distribution that results in a lower average age:

– 20 % below age 15

– 20 % between age 15 and 25

– 60 % above age 25

We’ll set the total population count to 100 million. Now assume that there’s a crime that is committed solely by people in the age group 15 to 25. Within this age group, 1 in 100000 will commit this crime over the course of one year, independently of what population group he or she belongs to. Note that this means that there’s no inclination towards this crime in any of the two groups.

It’s time to crunch the numbers. There are 0.9 · 100 million = 90 million natives. Of these, 0.15 · 90 million = 13.5 million are in the age group 15 to 25. This means we can expect 135 natives to commit this crime during a year.

As for the immigrants, there are 0.1 · 100 million = 10 million in the country, with 0.2 · 10 million = 2 million being in the age group of interest. They will give rise to an expected number of 20 crimes of this kind per year.

In total, we can expect this crime to be committed 155 times, with the immigrants having a share of 20 / 155 = 12.9 %. This is higher than their proportional share of 10 % despite there being no inclination for committing said crime. All that led to this result was the population being younger on average.

So concluding from a larger than proportional share of crime that there’s an inclination towards crime in this part of the population is not mathematically sound. To be able to draw any conclusions, we would need to know the expected value, which can be calculated from the age distribution of the crime and that of the population and can differ quite strongly from the proportional value.

(Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2)