error

Code Transmission and Probability

Not long ago did mankind first send rovers to Mars to analyze the planet and find out if it ever supported life. The nagging question “Are we alone?” drives us to penetrate deeper into space. A special challenge associated with such journeys is communication. There needs to be a constant flow of digital data, strings of ones and zeros, back and forth to ensure the success of the space mission.

During the process of transmission over the endless distances, errors can occur. There’s always a chance that zeros randomly turn into ones and vice versa. What can we do to make communication more reliable? One way is to send duplicates.

Instead of simply sending a 0, we send the string 00000. If not too many errors occur during the transmission, we can still decode it on arrival. For example, if it arrives as 00010, we can deduce that the originating string was with a high probability a 0 rather than a 1. The single transmission error that occurred did not cause us to incorrectly decode the string.

Assume that the probability of a transmission error is p and that we add to each 0 (or 1) four copies, as in the above paragraph. What is the chance of us being able to decode it correctly? To be able to decode 00000 on arrival correctly, we can’t have more than two transmission errors occurring. So during the n = 5 transmissions, k = 0, k = 1 and k = 2 errors are allowed. Using the binomial distribution we can compute the probability for each of these events:

p(0 errors) = C(5,0) · p^0 · (1-p)^5

p(1 error) = C(5,1) · p^1 · (1-p)^4

p(2 errors) = C(5,2) · p^2 · (1-p)^3

We can simplify these expressions somewhat. A binomial calculator provides us with these values: C(5,0) = 1, C(5,1) = 5 and C(5,2) = 10. This leads to:

p(0 errors) = (1-p)^5

p(1 error) = 5 · p · (1-p)^4

p(2 errors) = 10 · p^2 · (1-p)^3

Adding the probabilities for all these desired events tells us how likely it is that we can correctly decode the string.

p(success) = (1-p)^3 · ((1-p)^2 + 5·p·(1-p) + 10·p^2)

In the graph below you can see the plot of this function. The x-axis represents the transmission error probability p and the y-axis the chance of successfully decoding the string. For p = 10 % (1 in 10 bits arrive incorrectly) the odds of identifying the originating string are still a little more than 99 %. For p = 20 % (1 in 5 bits arrive incorrectly) this drops to about 94 %.

Code Transmission

The downside to this gain in accuracy is that the amount of data to be transmitted, and thus the time it takes for the transmission to complete, increases fivefold.

The Standard Error – What it is and how it’s used

I smoke electronic cigarettes and recently I wanted to find out how much nicotine liquid I consume per day. I noted the used amount on five consecutive days:

3 ml, 3.4 ml, 7.2 ml, 3.7 ml, 4.3 ml

So how much do I use per day? Well, our best guess is to do the average, that is, sum all the amounts and divide by the number of measurements:

(3 ml + 3.4 ml + 7.2 ml + 3.7 ml + 4.3 ml) / 5 = 4.3 ml

Most people would stop here. However, there’s one very important piece of information missing: how accurate is that result? Surely an average value of 4.3 ml computed from 100 measurements is much more reliable than the same average computed from 5 measurements. Here’s where the standard error comes in and thanks to the internet, calculating it couldn’t be easier. You can type in the measurements here to get the standard error:

http://www.miniwebtool.com/standard-error-calculator/

It tells us that the standard error (of the mean, to be pedantically precise) of my five measurements is SEM = 0.75. This number is extremely useful because there’s a rule in statistics that states that with a 95 % probability, the true average lies within two standard errors of the computed average. For us this means that there’s a 95 % chance, which you could call beyond reasonable doubt, that the true average of my daily liquid consumption lies in this intervall:

4.3 ml ± 1.5 ml

or between 2.8 and 5.8 ml. So the computed average is not very accurate. Note that as long as the standard deviation remains more or less constant as further measurements come in, the standard error is inversely proportional to the square root of the number of measurements. In simpler terms: If you quadruple the number of measurements, the size of the error interval halves. With 20 instead of only 5 measurements, we should be able to archieve plus/minus 0.75 accuracy.

So when you have an average value to report, be sure to include the error intervall. Your result is much more informative this way and with the help of the online calculator as well as the above rule, computing it is quick and painless. It took me less than a minute.

A more detailed explanation of the average value, standard deviation and standard error (yes, the latter two are not the same thing) can be found in chapter 7 of my Kindle ebook Statistical Snacks (this was not an excerpt).