Here’s the setup to an interesting puzzle: assume the time it takes to serve customers at a bank is exponentially distributed with mean service time 15 minutes. When you come in to the bank, both bank employees are busy serving customers. There are no other customers in the bank, except you and the two customers already being serviced.
What is the probability you will be the last customer to leave the bank among the three of you?
The answer is – possibly a surprising – 50 %.
Here’s the reasoning: You are guaranteed to not be the first to leave, because in order for your matter to be attended to, one of the other customers need to leave first. Once that’s happened, it’s down to two of you, both receiving service that is exponentially distributed with mean service time 15 minutes.
With the exponential distribution, it doesn’t matter how long the other customer has been there. The probability that a customer is finished in the next five minutes is constant1 Approximately 29 % in this case.. This means the probability that you are finished in the next five minutes is the exact same as the probability that the other customer is finished in the next five minutes! And which one of you finishes first is a coin flip.
The reason this matters is – at least for me – it really gave me a good sense of what the “memoryless” or “Markovian” property means. It’s a visceral example of what it means to say that \[P(X \le k) = P(X \le m + k \mid X > m).\]
The probability that you finish in \(k\) minutes is equal to the probability that the other customer finishes in \(m+k\) minutes, where \(m\) is the number of minutes they’ve already been there – regardless of what \(m\) is.
Visually, this is the probability that you finish in the first five minutes, the next five minutes after that, the ones after that, etc.
If we imagine that we cut this plot off at any point, say 20 minutes, then we get the shaded bars in
If we would take these bars and zoom in on them2 This is the visual interpretation of conditioning on \(X > m\)., they would look exactly the same as the white bars in the previous picture! The tail of the exponential distribution looks just like the whole thing, recursively.
Super- And Subexponential
This is incidentally also the thing we refer to when we call heavy-tailed distributions subexponential. When we zoom in on their tail, it falls off slower than the full distribution. If service times are subexponential, then the longer a customer is at the bank, the longer it will take until they are done. One way to intuit this is to think that if a customer has stayed long in the bank, that is evidence that their matter is complicated and requires a lot of work, which then means they will stay there for a long time to come.
Conversely, a superexponential distribution is one where if we zoom in on the tail, it falls off faster than the full distribution. In the bank example, this would mean that the longer someone has been at the bank, the more likely it is that they will leave in the next minute. The intuition around this is probably obvious: if they have been there for a long time, there cannot be much work remaining on their matter.
Both are reasonable models for bank customers, so which do we pick? One way to deal with it is to split the difference and assume bank matters are exponentially distributed. The better way to choose a model is to try all three and see what consequences they have for your conclusions. Then pick not the most optimistic one, but the most pessimistic one!