model

Modeling Theme Park Queues

Who doesn’t love a day at the theme park? You can go on thrilling roller‒coaster rides, enjoy elaborate shows, have a tasty lunch in between or just relax and take in the scenery. Of course there’s one thing that does spoil the fun a bit: the waiting. For the most popular attractions waiting times of around one hour are not uncommon during peak times, while the ride itself may be over in no more than two or three minutes.

Let’s work towards a basic model for queues in theme parks and other situations in which queues commonly arise. We will assume that the passing rate R(t), that is, the number of people passing the entrance of the attraction per unit time, is given. How many of these will enter the line? This will depend on the popularity of the attraction as well as the current size of the line. The more people are already in the line, the less likely others are to join. We’ll denote the number of people in the line at time t with n(t) and use this ansatz for the rate r(t) at which people join the queue:

Mathematical Shenanigans_html_5618c8ce

The constant a expresses the popularity of the attraction (more specifically, it is the percentage of passers‒by that will use the attraction if no queue is present) and the constant b is a “line repulsion” factor. The stronger visitors are put off by the presence of a line, the higher its value will be. How does the size of the line develop over time given the above function? We assume that the maximum serving rate is c people per unit time. So the rate of change for the number of people in line is (for n(t) ≥ 0):

Mathematical Shenanigans_html_5c443fa7

In case the numerical evaluation returns a value n(t) < 0 (which is obviously nonsense, but a mathematical possibility given our ansatz), we will force n(t) = 0. An interesting variation, into which we will not dive much further though, is to include a time lag. Usually the expected waiting time is displayed to visitors on a screen. The visitors make their decision on joining the line based on this information. However, the displayed waiting time is not updated in real‒time. We have to expect that there’s a certain delay d between actual and displayed length of the line. With this effect included, our equation becomes:

Mathematical Shenanigans_html_m2ee86a4f

Simulation

For the numerical solution we will go back to the delay‒free version. We choose one minute as our unit of time. For the passing rate, that is, the people passing by per minute, we set:

R(t) = 0.00046 · t · (720 ‒ t)

We can interpret this function as such: at t = 0 the park opens and the passing rate is zero. It then grows to a maximum of 60 people per minute at t = 360 minutes (or 6 hours) and declines again. At t = 720 minutes (or 12 hours) the park closes and the passing rate is back to zero. We will assume the popularity of the attraction to be:

a = 0.2

So if there’s no line, 20 % of all passers‒by will make use of the attraction. We set the maximum service rate to:

c = 5 people per minute

What about the “line repulsion” factor? Let’s assume that if the line grows to 200 people (given the above service rate this would translate into a waiting time of 40 minutes), the willingness to join the line drops from the initial 20 % to 10 %.

Mathematical Shenanigans_html_3d42d26c

→ b = 0.005

Given all these inputs and the model equation, here’s how the queue develops over time:

Mathematical Shenanigans_html_79a0f216

It shows that no line will form until around t = 100 minutes into opening the park (at which point the passing rate reaches 29 people per minute). Then the queue size increases roughly linearly for the next several hours until it reaches its maximum value of n = 256 people (waiting time: 51 minutes) at t = 440 minutes. Note that the maximum value of the queue size occurs much later than the maximum value of the passing rate. After reaching a maximum, there’s a sharp decrease in line length until the line ceases to be at around t = 685 minutes. Further simulations show that if you include a delay, there’s no noticeable change as long as the delay is in the order of a few minutes.

(This was an excerpt from my ebook “Mathematical Shenanigans”)

A Brief Look At Car-Following Models

Recently I posted a short introduction to recurrence relations – what they are and how they can be used for mathematical modeling. This post expands on the topic as car-following models are a nice example of recurrence relations applied to the real-world.

Suppose a car is traveling on the road at the speed u(t) at time t. Another car approaches this car from behind and starts following it. Obviously the driver of the car that is following cannot choose his speed freely. Rather, his speed v(t) at time t will be a result of whatever the driver in the leading car is doing.

The most basic car-following model assumes that the acceleration a(t) at time t of the follower is determined by the difference in speeds. If the leader is faster than the follower, the follower accelerates. If the leader is slower than the follower, the follower decelerates. The follower assumes a constant speed if there’s no speed difference. In mathematical form, this statement looks like this:

a(t) = λ * (u(t) – v(t))

The factor λ (sensitivity) determines how strongly the follower accelerates in response to a speed difference. To be more specific: it is the acceleration that results from a speed difference of one unit.

——————————————

Before we go on: how is this a recurrence relation? In a recurrence relation we determine a quantity from its values at an earlier time. This seems to be missing here. But remember that the acceleration is given by:

a(t) = (v(t+h) – v(t)) / h

with h being a time span. Inserted into the above car-following equation, we can see that it indeed implies a recurrence relation.

——————————————

Our model is still very crude. Here’s the biggest problem: The response of the driver is instantaneous. He picks up the speed difference at time t and turns this information into an acceleration also at time t. But more realistically, there will be a time lag. His response at time t will be a result of the speed difference at an earlier time t – Λ, with Λ being the reaction time.

a(t) = λ * (u(t – Λ) – v(t – Λ))

The reaction time is usually in the order of one second and consist of the time needed to process the information as well as the time it takes to move the muscles and press the pedal. There are several things we can do to make the model even more realistic. First of all, studies show that the speed difference is not the only factor. The distance d(t) between the leader and follower also plays an important role. The smaller it is, the stronger the follower will react. We can take this into account by putting the distance in the denominator:

a(t) = (λ / d(t)) * (u(t – Λ) – v(t – Λ))

You can also interpret this as making the sensitivity distance-dependent. There’s still one adjustment we need to make. The above model allows any value of acceleration, but we know that we can only reach certain maximum values in a car. Let’s symbolize the maximum acceleration by a(acc) and the maximum deceleration by a(dec). The latter will be a number smaller than zero since deceleration is by definition negative acceleration. We can write:

a(t) = a(acc) if (λ / d(t)) * (u(t – Λ) – v(t – Λ)) > a(acc)
a(t) = a(dec) if (λ / d(t)) * (u(t – Λ) – v(t – Λ)) < a(dec)
a(t) = (λ / d(t)) * (u(t – Λ) – v(t – Λ)) else

It probably looks simpler using an if-statement:

a(t) = (λ / d(t)) * (u(t – Λ) – v(t – Λ))

IF a(t) > a(acc) THEN
a(t) = a(acc)
ELSEIF a(t) < a(dec) THEN
a(t) = a(dec)
END IF

This model already catches a lot of nuances of car traffic. I hope I was able to give you some  insight into what car-following models are and how you can fine-tune them to satisfy certain conditions.

Recurrence Relations – A Simple Explanation And Much More

Recurrence relations are a powerful tool for mathematical modeling and numerically solving differential equations (no matter how complicated). And as luck would have it, they are relatively easy to understand and apply. So let’s dive right into it using a purely mathematical example (for clarity) before looking at a real-world application.

This equation is a typical example of a recurrence relation:

x(t+1) = 5 * x(t) + 2 * x(t-1)

At the heart of the equation is a certain quantity x. It appears three times: x(t+1) stands for the value of this quantity at a time t+1 (next month), x(t) for the value at time t (current month) and x(t-1) the value at time t-1 (previous month). So what the relation allows us to do is to determine the value of said quantity for the next month, given that we know it for the current and previous month. Of course the choice of time span here is just arbitrary, it might as well be a decade or nanosecond. What’s important is that we can use the last two values in the sequence to determine the next value.

Suppose we start with x(0) = 0 and x(1) = 1. With the recurrence relation we can continue the sequence step by step:

x(2) = 5 * x(1) + 2 * x(0) = 5 * 1 + 2 * 0 = 5

x(3) = 5 * x(2) + 2 * x(1) = 5 * 5 + 2 * 1 = 27

x(4) = 5 * x(3) + 2 * x(2) = 5 * 27 + 2 * 5 = 145

And so on. Once we’re given the “seed”, determining the sequence is not that hard. It’s just a matter of plugging in the last two data points and doing the calculation. The downside to defining a sequence recursively is that if you want to know x(500), you have to go through hundreds of steps to get there. Luckily, this is not a problem for computers.

In the most general terms, a recurrence relation relates the value of quantity x at a time t + 1 to the values of this quantity x at earlier times. The time itself could also appear as a factor. So this here would also be a legitimate recurrence relation:

x(t+1) = 5 * t * x(t) – 2 * x(t-10)

Here we calculate the value of x at time t+1 (next month) by its value at a time t (current month) and t – 10 (ten months ago). Note that in this case you need eleven seed values to be able to continue the sequence. If we are only given x(0) = 0 and x(10) = 50, we can do the next step:

x(11) = 5 * 10 * x(10) – 2 * x(0) = 5 * 10 * 50 – 2 * 0 = 2500

But we run into problems after that:

x(12) = 5 * 11 * x(11) – 2 * x(1) = 5 * 11 * 2500 – 2 * x(1) = ?

We already calculated x(11), but there’s nothing we can do to deduce x(1).

Now let’s look at one interesting application of such recurrence relations, modeling the growth of animal populations. We’ll start with a simple model that relates the number of animals x in the next month t+1 to the number of animals x in the current month t as such:

x(t+1) = x(t) + f * x(t)

The factor f is a constant that determines the rate of growth (to be more specific: its value is the decimal percentage change from one month to the next). So if our population grows with 25 % each month, we get:

x(t+1) = x(t) + 0.25 * x(t)

If we start with x(0) = 100 rabbits at month t = 0 we get:

x(1) = x(0) + 0.1 * x(0) = 100 + 0.25 * 100 = 125 rabbits

x(2) = x(1) + 0.1 * x(1) = 125 + 0.25 * 125 = 156 rabbits

x(3) = x(2) + 0.1 * x(2) = 156 + 0.25 * 156 = 195 rabbits

x(4) = x(3) + 0.1 * x(3) = 195 + 0.25 * 195 = 244 rabbits

x(5) = x(4) + 0.1 * x(4) = 244 + 0.25 * 244 = 305 rabbits

And so on. Maybe you already see the main problem with this exponential model: it just keeps on growing. This is fine as long as the population is small and the environment rich in ressources, but every environment has its limits. Let’s fix this problem by including an additional term in the recurrence relation that will lead to this behavior:

– Exponential growth as long as the population is small compared to the capacity
– Slowing growth near the capacity
– No growth at capacity
– Population decline when over the capacity

How can we translate this into mathematics? It takes a lot of practice to be able to tweak a recurrence relation to get the behavior you want. You just learned your first chord and I’m asking you to play Mozart, that’s not fair. But take a look at this bad boy:

x(t+1) = x(t) + a * x(t) * (1 – x(t) / C)

This is called the logistic model and the constant C represents said capacity. If x is much smaller than the capacity C, the ratio x / C will be close to zero and we are left with exponential growth:

x(t+1) ≈ x(t) + a * x(t) * (1 – 0)

x(t+1) ≈ x(t) + a * x(t)

So this admittedly complicated looking recurrence relation fullfils our first demand: exponential growth for small populations. What happens if the population x reaches the capacity C? Then all growth should stop. Let’s see if this is the case. With x = C, the ratio x / C is obviously equal to one, and in this case we get:

x(t+1) = x(t) + a * x(t) * (1 – 1)

x(t+1) = x(t)

The number of animals remains constant, just as we wanted. Last but not least, what happens if (for some reason) the population gets past the capacity, meaning that x is greater than C? In this case the ratio x / C is greater than one (let’s just say x / C = 1.2 for the sake of argument):

x(t+1) = x(t) + a * x(t) * (1 – 1.2)

x(t+1) = x(t) + a * x(t) * (- 0.2)

The second term is now negative and thus x(t+1) will be smaller than x(t) – a decline back to capacity. What an enormous amount of beautiful behavior in such a compact line of mathematics! This is where the power of recurrence relations comes to light. Anyways, let’s go back to our rabbit population. We’ll let them grow with 25 % (a = 0.25), but this time on an island that can only sustain 300 rabbits at most (C = 300). Thus the model looks like this:

x(t+1) = x(t) + 0.25 * x(t) * (1 – x(t) / 300)

If we start with x(0) = 100 rabbits at month t = 0 we get:

x(1) = 100 + 0.25 * 100 * (1 – 100 / 300) = 117 rabbits

x(2) = 117 + 0.25 * 117 * (1 – 117 / 300) = 135 rabbits

x(3) = 135 + 0.25 * 135 * (1 – 135 / 300) = 153 rabbits

x(4) = 153 + 0.25 * 153 * (1 – 153 / 300) = 172 rabbits

x(5) = 172 + 0.25 * 172 * (1 – 172 / 300) = 190 rabbits

Note that now the growth is almost linear rather than exponential and will slow down further the closer we get to the capacity (continue the sequence if you like, it will gently approach 300, but never go past it).

We can even go further and include random events in a recurrence relation. Let’s stick to the rabbits and their logistic growth and say that there’s a p = 5 % chance that in a certain month a flood occurs. If this happens, the population will halve. If no flood occurs, it will grow logistically as usual. This is what our new model looks like in mathematical terms:

x(t+1) = x(t) + 0.25 * x(t) * (1 – x(t) / 300)    if no flood occurs

x(t+1) = 0.5 * x(t)    if a flood occurs

To determine if there’s a flood, we let a random number generator spit out a number between 1 and 100 at each step. If it displays the number 5 or smaller, we use the “flood” equation (in accordance with the 5 % chance for a flood). Again we turn to our initial population of 100 rabbits with the growth rate and capacity unchanged:

x(1) = 100 + 0.25 * 100 * (1 – 100 / 300) = 117 rabbits

x(2) = 117 + 0.25 * 117 * (1 – 117 / 300) = 135 rabbits

x(3) = 135 + 0.25 * 135 * (1 – 135 / 300) = 153 rabbits

x(4) = 0.5 * 153 = 77 rabbits

x(5) = 77 + 0.25 * 77 * (1 – 77 / 300) = 91 rabbits

As you can see, in this run the random number generator gave a number 5 or smaller during the fourth step. Accordingly, the number of rabbits halved. You can do a lot of shenanigans (and some useful stuff as well) with recurrence relations and random numbers, the sky’s the limit. I hope this quick overview was helpful.

A note for the advanced: here’s how you turn a differential equation into a recurrence relation. Let’s take this differential equation:

dx/dt = a * x * exp(- b*x)

First multiply by dt:

dx = a * x * exp(- b * x) * dt

We set dx (the change in x) equal to x(t+h) – x(t) and dt (change in time) equal to a small constant h. Of course for x we now use x(t):

x(t+h) – x(t) = a * x(t) * exp(- b * x(t)) * h

Solve for x(t+h):

x(t+h) = x(t) + a * x(t) * exp(- b * x(t)) * h

And done! The smaller your h, the more accurate your numerical results. How low you can go depends on your computer’s computing power.

Mathematical Model For (E-) Book Sales

It seems to be a no-brainer that with more books on the market, an author will see higher revenues. I wanted to know more about how the sales rate varies with the number of books. So I did what I always do when faced with an economic problem: construct a mathematical model. Even though it took me several tries to find the right approach, I’m fairly confident that the following model is able to explain why revenues grow overproportionally with the number of books an author has published. I also stumbled across a way to correct the marketing R/C for number of books.

The basic quantities used are:

  • n = number of books
  • i = impressions per day
  • q = conversion probability (which is the probability that an impression results in a sale)
  • s = sales per buyer
  • r = daily sales rate

Obviously the basic relationship is:

r = i(n) * q(n) * s(n)

with the brackets indicating a dependence of the quantities on the number of books.

1) Let’s start with s(n) = sales per buyer. Suppose there’s a probability p that a buyer, who has purchased an author’s book, will go on to buy yet another book of said author. To visualize this, think of the books as some kind of mirrors: each ray (sale) will either go through the book (no further sales from this buyer) or be reflected on another book of the author. In the latter case, the process repeats. Using this “reflective model”, the number of sales per buyer is:

s(n) = 1 + p + p² + … + pn = (1 – pn) / (1 – p)

For example, if the probability of a reader buying another book from the same author is p = 15 % = 0.15 and the author has n = 3 books available, we get:

s(3) = (1 – 0.153) / (1 – 0.15) = 1.17 sales per buyer

So the number of sales per buyer increases with the number of books. However, it quickly reaches a limiting value. Letting n go to infinity results in:

s(∞) = 1 / (1 – p)

Hence, this effect is a source for overproportional growth only for the first few books. After that it turns into a constant factor.

2) Let’s turn to q(n) = conversion probability. Why should there be a dependence on number of books at all for this quantity? Studies show that the probability of making a sale grows with the choice offered. That’s why ridiculously large malls work. When an author offers a large number of books, he is able to provide list impression (featuring all his / her books) additionally to the common single impressions (featuring only one book). With more choice, the conversion probability on list impressions will be higher than that on single impressions.

  • qs = single impression conversion probability
  • ps = percentage of impressions that are single impressions
  • ql = list impression conversion probability
  • pl = percentage of impressions that are list impressions

with ps + pl = 1. The overall conversion probability will be:

q(n) = qs(n) * ps(n) + ql(n)* pl(n)

With ql(n) and pl(n) obviously growing with the number of books and ps(n) decreasing accordingly, we get an increase in the overall conversion probability.

3) Finally let’s look at i(n) = impressions per day. Denoting with i1, i2, … the number of daily impressions by book number 1, book number 2, … , the average number of impressions per day and book are:

ib = 1/n * ∑[k] ik

with ∑[k] meaning the sum over all k. The overall impressions per day are:

i(n) = ib(n) * n

Assuming all books generate the same number of daily impressions, this is a linear growth. However, there might be an overproportional factor at work here. As an author keeps publishing, his experience in writing, editing and marketing will grow. Especially for initially inexperienced authors the quality of the books and the marketing approach will improve with each book. Translated in numbers, this means that later books will generate more impressions per day:

ik+1 > ik

which leads to an overproportional (instead of just linear) growth in overall impressions per day with the number of books. Note that more experience should also translate into a higher single impression conversion probability:

qs(n+1) > qs(n)

4) As a final treat, let’s look at how these effects impact the marketing R/C. The marketing R/C is the ratio of revenues that result from an ad divided by the costs of the ad:

R/C = Revenues / Costs

For an ad to be of worth to an author, this value should be greater than 1. Assume an ad generates the number of iad single impressions in total. For one book we get the revenues:

R = iad * qs(1)

If more than one book is available, this number changes to:

R = iad * qs(n) * (1 – pn) / (1 – p)

So if the R/C in the case of one book is (R/C)1, the corrected R/C for a larger number of books is:

R/C = (R/C)1 * qs(n) / qs(1) * (1 – pn) / (1 – p)

In short: ads, that aren’t profitable, can become profitable as the author offers more books.

For more mathematical modeling check out: Mathematics of Blog Traffic: Model and Tips for High Traffic.

Mathematics of Blog Traffic: Model and Tips for High Traffic

Over the last few days I finally did what I long had planned and worked out a mathematical model for blog traffic. Here are the results. First we’ll take a look at the most general form and then use it to derive a practical, easily applicable formula.

We need some quantities as inputs. The time (in days), starting from the first blog entry, is denoted by t. We number the blog posts with the variable k. So k = 1 refers to the first post published, k = 2 to the second, etc … We’ll refer to the day on which entry k is published by t(k).

The initial number of visits entry k draws from the feed is symbolized by i(k), the average number of views per day entry k draws from search engines by s(k). Assuming that the number of feed views declines exponentially for each article with a factor b (my observations put the value for this at around 0.4 – 0.6), this is the number of views V the blog receives on day t:

V(t) = Σ[k] ( s(k) + i(k) · bt – t(k))

Σ[k] means that we sum over all k. This is the most general form. For it to be of any practical use, we need to make simplifying assumptions. We assume that the entries are published at a constant frequency f (entries per day) and that each article has the same popularity, that is:

i(k) = i = const.
s(k) = s = const.

After a long calculation you can arrive at this formula. It provides the expected number of daily views given that the above assumptions hold true and that the blog consists of n entries in total:

V = s · n + i / ( 1 – b1/f )

Note that according to this formula, blog traffic increases linearly with the number of entries published. Let’s apply the formula. Assume we publish articles at a frequency f = 1 per day and they draw i = 5 views on the first day from the feed and s = 0.1 views per day from search engines. With b = 0.5, this leads to:

V = 0.1 · n + 10

So once we gathered n = 20 entries with this setup, we can expect V = 12 views per day, at n = 40 entries this grows to V = 14 views per day, etc … The theoretical growth of this blog with number of entries is shown below:

viewsentries

How does the frequency at which entries are being published affect the number of views? You can see this dependency in the graph below (I set n = 40):

viewsfrequency

The formula is very clear about what to do for higher traffic: get more attention in the feed (good titles, good tagging and a large number of followers all lead to high i and possibly reduced b), optimize the entries for search engines (high s), publish at high frequency (obviously high f) and do this for a long time (high n).

We’ll draw two more conclusions. As you can see the formula neatly separates the search engine traffic (left term) and feed traffic (right term). And while the feed traffic reaches a constant level after a while of constant publishing, it is the search engine traffic that keeps on growing. At a critical number of entries N, the search engine traffic will overtake the feed traffic:

N = i / ( s · ( 1 – b1/f ) )

In the above blog setup, this happens at N = 100 entries. At this point both the search engines as well as the feed will provide 10 views per day.

Here’s one more conclusion: the daily increase in the average number of views is just the product of the daily search engine views per entry s and the publishing frequency f:

V / t = s · f

Thus, our example blog will experience an increase of 0.1 · 1 = 0.1 views per day or 1 additional view per 10 days. If we publish entries at twice the frequency, the blog would grow with 0.1 · 2 = 0.2 views per day or 1 additional view every 5 days.