science

Mathematics of Blog Traffic: Model and Tips for High Traffic

Over the last few days I finally did what I long had planned and worked out a mathematical model for blog traffic. Here are the results. First we’ll take a look at the most general form and then use it to derive a practical, easily applicable formula.

We need some quantities as inputs. The time (in days), starting from the first blog entry, is denoted by t. We number the blog posts with the variable k. So k = 1 refers to the first post published, k = 2 to the second, etc … We’ll refer to the day on which entry k is published by t(k).

The initial number of visits entry k draws from the feed is symbolized by i(k), the average number of views per day entry k draws from search engines by s(k). Assuming that the number of feed views declines exponentially for each article with a factor b (my observations put the value for this at around 0.4 – 0.6), this is the number of views V the blog receives on day t:

V(t) = Σ[k] ( s(k) + i(k) · bt – t(k))

Σ[k] means that we sum over all k. This is the most general form. For it to be of any practical use, we need to make simplifying assumptions. We assume that the entries are published at a constant frequency f (entries per day) and that each article has the same popularity, that is:

i(k) = i = const.
s(k) = s = const.

After a long calculation you can arrive at this formula. It provides the expected number of daily views given that the above assumptions hold true and that the blog consists of n entries in total:

V = s · n + i / ( 1 – b1/f )

Note that according to this formula, blog traffic increases linearly with the number of entries published. Let’s apply the formula. Assume we publish articles at a frequency f = 1 per day and they draw i = 5 views on the first day from the feed and s = 0.1 views per day from search engines. With b = 0.5, this leads to:

V = 0.1 · n + 10

So once we gathered n = 20 entries with this setup, we can expect V = 12 views per day, at n = 40 entries this grows to V = 14 views per day, etc … The theoretical growth of this blog with number of entries is shown below:

viewsentries

How does the frequency at which entries are being published affect the number of views? You can see this dependency in the graph below (I set n = 40):

viewsfrequency

The formula is very clear about what to do for higher traffic: get more attention in the feed (good titles, good tagging and a large number of followers all lead to high i and possibly reduced b), optimize the entries for search engines (high s), publish at high frequency (obviously high f) and do this for a long time (high n).

We’ll draw two more conclusions. As you can see the formula neatly separates the search engine traffic (left term) and feed traffic (right term). And while the feed traffic reaches a constant level after a while of constant publishing, it is the search engine traffic that keeps on growing. At a critical number of entries N, the search engine traffic will overtake the feed traffic:

N = i / ( s · ( 1 – b1/f ) )

In the above blog setup, this happens at N = 100 entries. At this point both the search engines as well as the feed will provide 10 views per day.

Here’s one more conclusion: the daily increase in the average number of views is just the product of the daily search engine views per entry s and the publishing frequency f:

V / t = s · f

Thus, our example blog will experience an increase of 0.1 · 1 = 0.1 views per day or 1 additional view per 10 days. If we publish entries at twice the frequency, the blog would grow with 0.1 · 2 = 0.2 views per day or 1 additional view every 5 days.

Statistics: The Multiplication Rule Gently Explained

Multiplication is a surprisingly powerful tool in statistics. It enables us to solve a vast amount of problems with relative ease. One thing to remember though is that the multiplication rule, to which I’ll get in a bit, only works for independent events. So let’s talk about those first.

When we roll a dice, there’s a certain probability that the number six will show. This probability does not depend on what number we rolled before. The events “rolling a three” and “rolling a six” are independent in the sense, that the occurrence of the one event does not affect the probability for the other.

Let’s look at a card deck. We draw a card and note it. Afterward, we put it back in the deck and mix the cards. Then we draw another one. Does the event “draw an ace” in the first try affect the event “draw a king” in the second try? It does not, because we put the ace back in the deck and mixed the cards. We basically reset our experiment. In such a case, the events “draw an ace” and “draw a king” are independent.

But what if we don’t put the first card back in the deck? Well, when we take the ace out of the deck, the chance of drawing a king will increase from 4 / 52 (4 kings out of 52 cards) to 4 / 51 (4 kings out of 51 cards). If we don’t do the reset, the events “draw an ace” and “draw a king” are in fact dependent. The occurrence of one changes the probability for the other.

With this in mind, we can turn to our powerful tool called multiplication rule. We start with two independent events, A and B. The probabilities for their occurrence are respectively p(A) and p(B). The multiplication rule states that the probability of both events occurring is simply the product of the probabilities p(A) and p(B). In mathematical terms:

p(A and B) = p(A) · p(B).

A quick look at the dice will make this clear. Let’s take both A and B to be the event “rolling a six”. Obviously they are independent, rolling a six on one try will not change the probability of rolling a six in the following try. So we are allowed to use the multiplication rule here. The probability of rolling a six is 1/6, so p(A) = p(B) = 1/6. Using the multiplication rule, we can calculate the chance of rolling two six in a row: p(A and B) = 1/6 · 1/6 = 1/36. Note that if we took A to be “rolling a six” and B to be “rolling a three”, we would arrive at the same result. The chance of rolling two six in a row is the same as rolling a six and then a three.

 Can we also use this on the deck of cards, even if we don’t reset the experiment? Indeed we can. But we have to take into account that the probabilities change as we go along. In more abstract terms, instead of looking at the general events “draw an ace” and “draw a king”, we need to look at the events A = “draw an ace in the first try” and B = “draw a king with one ace missing”. With the order of the events clearly set, there’s no chance of them interfering. The occurrence of both events, first drawing an ace and then drawing a king with the ace missing, has the probability: p(A and B) = p(A) · p(B) = 4/52 · 4/51 = 16/2652 or 1 in about 165 or 0.6 %.

For examples on how to apply the multiplication rule check out Multiple Choice Tests and Monkeys on Typewriters.

Analysis: Size and Loading Times of WordPress.com Blogs

In the fast paced online world people are not so patient as in real life. Accordingly, having a large home page size and loading time can negatively affect your blog traffic. Studies have shown that the greater the loading time, the higher the bounce rate. To find out how well my blog performs with respect to this (feel free to use the results for your benefits as well), I did a analysis of 70 WordPress.com blogs. I used iWEBTOOLS’s Website Speed Test and OriginPro for that. With the tool you can analyze ten webpages at once, but note that after ten queries you have to wait a full day (not an hour as the website claims) to do more analysis.

The average size of a WordPress.com blog according to the analysis is 65.3 KB with a standard error SE = 3.0 KB. Here’s how the size is distributed:

WPSize

The average loading time at my internet speed (circa 600 KB/s) is 0.66 s with the standard error SE = 0.10 s. Here’s the corresponding distribution:

WPLoading2

Note that the graph obviously depends on your internet speed. If you have faster internet, the whole distribution will shift to the left. My blog has a home page size of 81.6 KB. From the first graph I can deduce that only about 24 % of home pages are larger in size. My loading time is 0.86 s, here only about 22 % top that. So it looks like I really have to throw off some weight.

Here’s the loading time plotted against the home page size:

WPLoadingSize

In a very rough approximation we have the relation:

loading time = 0.009 * size

In other words: getting rid of 10 KB should lower the loading time by about 0.1 seconds. Now feel free to check your own blog and see where it fits in. If you got the time, post your results (if possible including URL, size, loading time, internet speed) in the comments. I’d greatly appreciate the additional data. For a reliable result regarding loading time it’s best to check the same page three times and do the average.

The Standard Error – What it is and how it’s used

I smoke electronic cigarettes and recently I wanted to find out how much nicotine liquid I consume per day. I noted the used amount on five consecutive days:

3 ml, 3.4 ml, 7.2 ml, 3.7 ml, 4.3 ml

So how much do I use per day? Well, our best guess is to do the average, that is, sum all the amounts and divide by the number of measurements:

(3 ml + 3.4 ml + 7.2 ml + 3.7 ml + 4.3 ml) / 5 = 4.3 ml

Most people would stop here. However, there’s one very important piece of information missing: how accurate is that result? Surely an average value of 4.3 ml computed from 100 measurements is much more reliable than the same average computed from 5 measurements. Here’s where the standard error comes in and thanks to the internet, calculating it couldn’t be easier. You can type in the measurements here to get the standard error:

http://www.miniwebtool.com/standard-error-calculator/

It tells us that the standard error (of the mean, to be pedantically precise) of my five measurements is SEM = 0.75. This number is extremely useful because there’s a rule in statistics that states that with a 95 % probability, the true average lies within two standard errors of the computed average. For us this means that there’s a 95 % chance, which you could call beyond reasonable doubt, that the true average of my daily liquid consumption lies in this intervall:

4.3 ml ± 1.5 ml

or between 2.8 and 5.8 ml. So the computed average is not very accurate. Note that as long as the standard deviation remains more or less constant as further measurements come in, the standard error is inversely proportional to the square root of the number of measurements. In simpler terms: If you quadruple the number of measurements, the size of the error interval halves. With 20 instead of only 5 measurements, we should be able to archieve plus/minus 0.75 accuracy.

So when you have an average value to report, be sure to include the error intervall. Your result is much more informative this way and with the help of the online calculator as well as the above rule, computing it is quick and painless. It took me less than a minute.

A more detailed explanation of the average value, standard deviation and standard error (yes, the latter two are not the same thing) can be found in chapter 7 of my Kindle ebook Statistical Snacks (this was not an excerpt).

Increase Views per Visit by Linking Within your Blog

One of the most basic and useful performance indicator for blogs is the average number of views per visit. If it is high, that means visitors stick around to explore the blog after reading a post. They value the blog for being well-written and informative. But in the fast paced, content saturated online world, achieving a lot of views per visit is not easy.

You can help out a little by making exploring your blog easier for readers. A good way to do this is to link within your blog, that is, to provide internal links. Keep in mind though that random links won’t help much. If you link one of your blog post to another, they should be connected in a meaningful way, for example by covering the same topic or giving relevant additional information to what a visitor just read.

Being mathematically curious, I wanted to find a way to judge what impact such internal links have on the overall views per visit. Assume you start with no internal links and observe a current number views per visitor of x. Now you add n internal links in your blog, which has in total a number of m entries. Given that the probability for a visitor to make use of an internal link is p, what will the overall number of views per visit change to? Yesterday night I derived a formula for that:

x’ = x + (n / m) · (1 / (1-p) – 1)

For example, my blog (which has as of now very few internal links) has an average of x = 2.3 views per visit and m = 42 entries. If I were to add n = 30 internal links and assuming a reader makes use of an internal link with the probability p = 20 % = 0.2, this should theoretically change into:

x’ = 2.3 + (30 / 42) · (1 / 0.8 – 1) = 2.5 views per visit

A solid 9 % increase in views per visit and this just by providing visitors a simple way to explore. So make sure to go over your blog and connect articles that are relevant to each other. The higher the relevancy of the links, the higher the probability that readers will end up using them. For example, if I only added n = 10 internal links instead of thirty, but had them at such a level of relevancy that the probability of them being used increases to p = 40 % = 0.4, I would end up with the same overall views per visit:

x’ = 2.3 + (10 / 42) · (1 / 0.6 – 1) = 2.5 views per visit

So it’s about relevancy as much as it is about amount. And in the spirit of not spamming, I’d prefer adding a few high-relevancy internal links that a lot low-relevancy ones.

If you’d like to know more on how to optimize your blog, check out: Setting the Order for your WordPress Blog Posts and Keywords: How To Use Them Properly On a Website or Blog.

Comets: Visitors From Beyond

The one thing we love the most in the world of astronomy is a good mystery. And if there was ever a mysterious and yet very powerful force of nature that we witness in the night skies, it is the coming of the mighty comet.

The arrival of a comet within view of Earth is an event of international importance. Witness the huge media attention that the Haley or Hale-Bopp have had when they have come within view The sight of these amazing space objects is simultaneously frightening and awe inspiring.

Image

Above all, it is during these comet viewings that the astronomer comes out in all of us. But what is a comet? Where did it come from? And how does it get that magnificent tail?

We should never confuse comets with asteroids. Asteroids are small space rocks that come from an asteroid belt between Mars and Jupiter. While still quite stunning to see, they pale in comparison to the arrival of a comet. Asteroids also have received considerable study by the scientific community.

Not as much is known about comets. As a rule, comets are considerably larger than asteroids. The composition of a comet is a mixture of nebulous, gasses, ice, dust and space debris. One scientist called the composition of a comet as similar to a “dirty snowball” because the composition is so diverse and changeable. The center or nucleus of a comet is usually quiet solid but the “snowball” materials often create a “cloud” around that nucleus that can become quite large and that extends at great lengths behind the comet as it moves through space. That trailing plume is what makes up the comet’s magnificent tail that makes it so exciting to watch when a comet comes within view of Earth.

The origins of comets is similarly mysterious. There are a number of theories about where they come from but it is clear that they originate from outside our solar system, somewhere in deep space. Some have speculated they are fragments left over from the organization of planets that get loose from whatever gravitational pull and are sent flying across space to eventually get caught up in the gravity of our sun bringing them into our solar system.

Another theory is that they come from a gaseous cloud called the Oort cloud which is cooling out there after the organization of the sun. As this space debris cools, it gets organized into one body which then gathers sufficient mass to be attracted into the gravity of our solar system turning into a fast moving comet plummeting toward our sun. However, because of the strong gravitational orbits of the many planets in our solar system, the comet does not always immediately collide with the sun and often takes on an orbit of its own.

The life expectancy of comets varies widely. Scientists refer to a comet that is expected to burn out or impact the sun within two hundred years as a short period comet whereas a long period comet has a life expectancy of over two hundred years. That may seem long to us as earth dwellers but in terms of stars and planets, this is a very short life as a space object indeed.

Scientists across the globe have put together some pretty impressive probes to learn more about comets to aid our understanding of these visitors from beyond. In 1985, for example, the United States put a probe into the path of the comet Giacobini-Zinner which passed through the comets tail gathering tremendous scientific knowledge about comets. Then in 1986, an international collation of scientists were able to launch a probe that was able to fly close to Haley’s comet as it passed near Earth and continue the research.

While science fiction writers and tabloid newspapers like to alarm us with the possibility of a comet impacting the earth, scientists who understand the orbits of comets and what changes their paths tell us this is unlikely. That is good because some comets reach sizes that are as big as a planet so that impact would be devastating. For now, we can enjoy the fun of seeing comets make their rare visits to our night sky and marvel at the spectacular shows that these visitors from beyond put on when they are visible in the cosmos.

Quantitative Analysis of Top 60 Kindle Romance Novels

I did a quantitative analysis of the current Top 60 Kindle Romance ebooks. Here are the results. First I’ll take a look at all price related data and conclusions.

—————————————————————————–

  • Price over rank:

pricerank

There seems to be no relation between price and rank. A linear fit confirmed this. The average price was 3.70 $ with a standard deviation of 2.70 $.

—————————————————————————–

  • Price frequency count:

pricescount

(Note that prices have been rounded up) About one third of all romance novels in the top 60 are offered for 1 $. Roughly another third for 3 $ or 4 $.

—————————————————————————–

  • Price per 100 pages over rank:

pricerank

Again, no relation here. The average price per 100 pages was 1.24 $ with a standard deviation of 0.86 $.

—————————————————————————–

  • Price per 100 pages frequency count:

PPP1

About half of all novels in the top 60 have a price per 100 pages lower than 1.20 $. Another third lies between 1.20 $ and 1.60 $.

—————————————————————————–

  • Price per 100 pages over number of pages:

PPP2

As I expected, the bigger the novel, the less you pay per page. Romance novels of about 200 pages cost 1.50 $ per 100 pages, while at 400 pages the price drops to about 1 $ per 100 pages. The decline is statistically significant, however there’s a lot of variation.

—————————————————————————–

  • Review count:

reviewscount

A little less than one half of the top novels have less than 50 reviews. About 40 % have between 50 and 150 reviews. Note that some of the remaining 10 % more than 600 reviews (not included in the graph).

—————————————————————————–

  • Rating over rank:

rankreviews

There’s practically no dependence of rank on rating among the top 60 novels. However, all have a rating of 3.5 stars or higher, most of them (95 %) 4 stars or higher.

—————————————————————————–

  • Pages over ranking:

pagesrank

There’s no relation between number of pages and rank. A linear fit confirmed this. The average number of pages was 316 with a standard deviation of 107.

—————————————————————————–

  • Pages count:

pagescount

About 70 % of the analyzed novels have between 200 and 400 pages. 12 % are below and 18 % above this range.

Mathematics of Explosions

When a strong explosion takes place, a shock wave forms that propagates in a spherical manner away from the source of the explosion. The shock front separates the air mass that is heated and compressed due to the explosion from the undisturbed air. In the picture below you can see the shock sphere that resulted from the explosion of Trinity, the first atomic bomb ever detonated.

Great Formulas_html_m67b54715

Using the concept of similarity solutions, the physicists Taylor and Sedov derived a simple formula that describes how the radius r (in m) of such a shock sphere grows with time t (in s). To apply it, we need to know two additional quantities: the energy of the explosion E (in J) and the density of the surrounding air D (in kg/m3). Here’s the formula:

r = 0.93 · (E / D)0.2 · t0.4

Let’s apply this formula for the Trinity blast.

———————-

In the explosion of the Trinity the amount of energy that was released was about 20 kilotons of TNT or:

E = 84 TJ = 84,000,000,000,000 J

Just to put that into perspective: in 2007 all of the households in Canada combined used about 1.4 TJ in energy. If you were able to convert the energy released in the Trinity explosion one-to-one into useable energy, you could power Canada for 60 years.

But back to the formula. The density of air at sea-level and lower heights is about D = 1.25 kg/m3. So the radius of the sphere approximately followed this law:

r = 542 · t0.4

After one second (t = 1), the shock front traveled 542 m. So the initial velocity was 542 m/s ≈ 1950 km/h ≈ 1210 mph. After ten seconds (t = 10), the shock front already covered a distance of about 1360 m ≈ 0.85 miles.

How long did it take the shock front to reach people two miles from the detonation? Two miles are approximately 3200 m. So we can set up this equation:

3200 = 542 · t0.4

We divide by 542:

5.90 t0.4

Then take both sides to the power of 2.5:

t 85 s ≈ 1 and 1/2 minutes

———————-

Let’s look at how the different parameters in the formula impact the radius of the shock sphere:

  • If you increase the time sixfold, the radius of the sphere doubles. So if it reached 0.85 miles after ten seconds, it will have reached 1.7 miles after 60 seconds. Note that this means that the speed of the shock front continuously decreases.

For the other two parameters, it will be more informative to look at the initial speed v (in m/s) rather the radius of the sphere at a certain time. As you noticed in the example, we get the initial speed by setting t = 1, leading to this formula:

v = 0.93 · (E / D)0.2

  • If you increase the energy of the detonation 35-fold, the initial speed of the shock front doubles. So for an atomic blast of 20 kt · 35 = 700 kt, the initial speed would be approximately 542 m /s · 2 = 1084 m/s.

  • The density behaves in the exact opposite way. If you increase it 35-fold, the initial speed halves. So if the test were conducted at an altitude of about 20 miles (where the density is only one thirty-fifth of its value on the ground), the shock wave would propagate at 1084 m/s

Another field in which the Taylor-Sedov formula is commonly applied is astrophysics, where it is used to model Supernova explosions. Since the energy released in such explosions dwarfs all atomic blasts and the surrounding density in space is very low, the initial expansion rate is extremely high.

This was an excerpt from the ebook “Great Formulas Explained – Physics, Mathematics, Economics”, released yesterday and available here: http://www.amazon.com/dp/B00G807Y00. You can take another quick look at the physics of shock waves here: Mach Cone.

Probability and Multiple Choice Tests

Imagine taking a multiple choice test that has three possible answers to each question. This means that even if you don’t know any answer, your chance of getting a question right is still 1/3. How likely is it to get all questions right by guessing if the test contains ten questions?

Here we are looking at the event “correct answer” which occurs with a probability of p(correct answer) = 1/3. We want to know the odds of this event happening ten times in a row. For that we simply apply the multiplication rule:

  • p(all correct) = (1/3)10 = 0.000017

Doing the inverse, we can see that this corresponds to about 1 in 60000. So if we gave this test to 60000 students who only guessed the answers, we could expect only one to be that lucky. What about the other extreme? How likely is it to get none of the ten questions right when guessing?

Now we must focus on the event “incorrect answer” which has the probability p(incorrect answer) = 2/3. The odds for this to occur ten times in a row is:

  • p(all incorrect) = (2/3)10 = 0.017

In other words: 1 in 60. Among the 60000 guessing students, this outcome can be expected to appear 1000 times. How would these numbers change if we only had eight instead of ten questions? Or if we had four options per question instead of three? I leave this calculation up to you.

Physics (And The Formula That Got Me Hooked)

A long time ago, in my teen years, this was the formula that got me hooked on physics. Why? I can’t say for sure. I guess I was very surprised that you could calculate something like this so easily. So with some nostalgia, I present another great formula from the field of physics. It will be a continuation of and a last section on energy.

To heat something, you need a certain amount of energy E (in J). How much exactly? To compute this we require three inputs: the mass m (in kg) of the object we want to heat, the temperature difference T (in °C) between initial and final state and the so called specific heat c (in J per kg °C) of the material that is heated. The relationship is quite simple:

E = c · m · T

If you double any of the input quantities, the energy required for heating will double as well. A very helpful addition to problems involving heating is this formula:

E = P · t

with P (in watt = W = J/s) being the power of the device that delivers heat and t (in s) the duration of the heat delivery.

———————

The specific heat of water is c = 4200 J per kg °C. How much energy do you need to heat m = 1 kg of water from room temperature (20 °C) to its boiling point (100 °C)? Note that the temperature difference between initial and final state is T = 80 °C. So we have all the quantities we need.

E = 4200 · 1 · 80 = 336,000 J

Additional question: How long will it take a water heater with an output of 2000 W to accomplish this? Let’s set up an equation for this using the second formula:

336,000 = 2000 · t

t ≈ 168 s ≈ 3 minutes

———————-

We put m = 1 kg of water (c = 4200 J per kg °C) in one container and m = 1 kg of sand (c = 290 J per kg °C) in another next to it. This will serve as an artificial beach. Using a heater we add 10,000 J of heat to each container. By what temperature will the water and the sand be raised?

Let’s turn to the water. From the given data and the great formula we can set up this equation:

10,000 = 4200 · 1 · T

T ≈ 2.4 °C

So the water temperature will be raised by 2.4 °C. What about the sand? It also receives 10,000 J.

10,000 = 290 · 1 · T

T ≈ 34.5 °C

So sand (or any ground in general) will heat up much stronger than water. In other words: the temperature of ground reacts quite strongly to changes in energy input while water is rather sluggish. This explains why the climate near oceans is milder than inland, that is, why the summers are less hot and the winters less cold. The water efficiently dampens the changes in temperature.

It also explains the land-sea-breeze phenomenon (seen in the image below). During the day, the sun’s energy will cause the ground to be hotter than the water. The air above the ground rises, leading to cooler air flowing from the ocean to the land. At night, due to the lack of the sun’s power, the situation reverses. The ground cools off quickly and now it’s the air above the water that rises.

Image
———————-

I hope this formula got you hooked as well. It’s simple, useful and can explain quite a lot of physics at the same time. It doesn’t get any better than this. Now it’s time to leave the concept of energy and turn to other topics.

This was an excerpt from my Kindle ebook: Great Formulas Explained – Physics, Mathematics, Economics. For another interesting physics quicky, check out: Intensity (or: How Much Power Will Burst Your Eardrums?).

A tunnel through earth and a surprising result …

Recently I found an interesting problem: A straight tunnel is being drilled through the earth (see picture; tunnel is drawn with two lines) and rails are installed in the tunnel. A train travels, only driven by gravitation and frictionless, along the rails. How long does it take the train to travel through this earth tunnel of length l?

The calculation, shows a surprising result. The travel time is independent of the length l; the time it takes the train to travel through a 1 Km tunnel is the same as through a 5000 Km tunnel, about 2500 seconds or 42 minutes! Why is that?

Imagine a model train on rails. If you put the rails on flat ground, the train won’t move. The gravitational force is pulling on the train, but not in the direction of travel. If you incline the rails slighty, the train starts to move slowly, if you incline the rails strongly, it rapidly picks up speed.

Now lets imagine a tunnel through the earth! A 1 Km tunnel will only have a slight inclination and the train would accelerate slowly. It would be a pleasant trip for the entire family. But a 5000 Km train would go steeply into the ground, the train would accelerate with an amazing rate. It would be a hell of a ride! This explains how we always get the same travel time: the 1 Km tunnel is short and the velocity would remain low, the 5000 Km is long, but the velocity would become enormous.

Here is how the hell ride through the 5000 Km tunnel looks in detail:

The red, monotonous increasing curve, shows distance traveled (in Km) versus time (in seconds), the blue curve shows velocity (in Km/s) versus time. In the center of the tunnel the train reaches the maximum velocity of about 3 Km/s, which corresponds to an incredible 6700 mi/h!

Missile Accuracy (CEP) – Excerpt from “Statistical Snacks”

An important quantity when comparing missiles is the CEP (Circular Error Probable). It is defined as the radius of the circle in which 50 % of the fired missiles land. The smaller it is, the better the accuracy of the missile. The German V2 rockets for example had a CEP of about 17 km. So there was a 50/50 chance of a V2 landing within 17 km of its target. Targeting smaller cities or even complexes was next to impossible with this accuracy, one could only aim for a general area in which it would land rather randomly.

Today’s missiles are significantly more accurate. The latest version of China’s DF-21 has a CEP about 40 m, allowing the accurate targeting of small complexes or large buildings, while CEP of the American made Hellfire is as low as 4 m, enabling precision strikes on small buildings or even tanks.

Assuming the impacts are normally distributed, one can derive a formula for the probability of striking a circular target of Radius R using a missile with a given CEP:

p = 1 – exp( -0.41 · R² / CEP² )

This quantity is also called the “single shot kill probability” (SSKP). Let’s include some numerical values. Assume a small complex with the dimensions 100 m by 100 m is targeted with a missile having a CEP of 150 m. Converting the rectangular area into a circle of equal area gives us a radius of about 56 m. Thus the SSKP is:

p = 1 – exp( -0.41 · 56² / 150² ) = 0.056 = 5.6 %

So the chances of hitting the target are relatively low. But the lack in accuracy can be compensated by firing several missiles in succession. What is the chance of at least one missile hitting the target if ten missiles are fired? First we look at the odds of all missiles missing the target and answer the question from that. One missile misses with 0.944 probability, the chance of having this event occur ten times in a row is:

p(all miss) = 0.94410 = 0.562

Thus the chance of at least one hit is:

p(at least one hit) = 1 – 0.562 = 0.438 = 43.8 %

Still not great considering that a single missile easily costs 10000 $ upwards. How many missiles of this kind must be fired at the complex to have a 90 % chance at a hit? A 90 % chance at a hit means that the chance of all missiles missing is 10 %. So we can turn the above formula for p(all miss) into an equation by inserting p(all miss) = 0.1 and leaving the number of missiles n undetermined:

0.1 = 0.944n

All that’s left is doing the algebra. Applying the natural logarithm to both sides and solving for n results in:

n = ln(0.1) / ln(0.944) = 40

So forty missiles with a CEP of 150 m are required to have a 90 % chance at hitting the complex. As you can verify by doing the appropriate calculations, three DF-21 missiles would have achieved the same result.

Liked the excerpt? Get the book “Statistical Snacks” by Metin Bektas here: http://www.amazon.com/Statistical-Snacks-ebook/dp/B00DWJZ9Z2. For more excerpts see The Probability of Becoming a Homicide Victim and How To Use the Expected Value.

The Fourth State of Matter – Plasmas

From our everyday lifes we are used to three states of matter: solid, liquid and gas. When we heat a solid it melts and becomes liquid. Heating this liquid further will cause it to evaporate to a gas. Usually this is what we consider to be the end of the line. But heating a gas leads to many surprises, it eventually turns into a state, which behaves completely different than ordinary gases. We call matter in that state a plasma.

 To understand why at some point a gas will exhibit an unusual behaviour, we need to look at the basic structure of matter. All matter consists of atoms. The Greeks believed this to be the undivisible building blocks of all objects. Scientists however have discovered, that atoms do indeed have an inner structure and are divisible. It takes an enormous amount to split atoms, but it can be done.

 Further research showed that atoms consist of three particles: neutrons, protons and electrons. The neutrons and protons are crammed into the atomic core, while the electrons surround this core. Usually atoms are not charged, because they contain as much protons (positively charged) as electrons (negatively charged). The charges balance each other. Only when electrons are missing does the atom become electric. Such charged atoms are called ions.

 In a gas the atoms are neutral. Each atom has as many protons as electrons, they are electrically balanced. When you apply a magnetic field to a gas, it does not respond. If you try to use the gas to conduct electricity, it does not work.

 Remember that gas molecules move at high speeds and collide frequently with each other. As you increase the temperature, the collisions become more violent. At very high temperatures the collisions become so violent, that the impact can knock some electrons off an atom (ionization). This is where the plasma begins and the gas ends.

 In a plasma the collisions are so intense that the atoms are not able to hold onto their outer electrons. Instead of a large amount of neutral atoms like in the gas, we are left with a mixture of free electrons and ions. This electric soup behaves very differently: it responds to magnetic fields and can conduct electricity very efficiently.

plasma1

 (The phases of matter. Source: NASA)

Most matter in the universe is in plasma form. Scientist believe that only 1 % of all visible matter is either solid, liquid or gaseous. On earth it is different, we rarely see plasmas because the temperatures are too small. But there are some exceptions.

 High-temperature flames can cause a small volume of air to turn into a plasma. This can be seen for example in the so called ionic wind experiment, which shows that a flame is able to transmit electric currents. Gases can’t do that. DARPA, the Pentagon’s research arm, is currently using this phenomenon to develop new methods of fire suppression. Other examples for plasmas on earth are lightnings and the Aurora Borealis.

plasma2

 (Examples of plasmas. Source: Contemporary Physics Education Project)

The barrier between gases and plasmas is somewhat foggy. An important quantity to characterize the transition from gas to plasma is the ionization degree. It tells us how many percent of the atoms have lost one or more electrons. So an ionization degree of 10 % means that only one out of ten atoms is ionized. In this case the gas properties are still dominant.

plasma3

 (Ionization degree of Helium over Temperature. Source: SciVerse)