Two Wrongs

Forecasting ADIZ Violations

Forecasting ADIZ Violations

Note: This article was written before May 1, and so the recent spike to 35 violations in a day was in the future at the time of writing. Did I make a mistake, or was there something else that happened that my model didn’t account for? You tell me!

In the current Quarterly Cup there’s a question on what the highest number of violations of Taiwan’s air defense identification zone (adiz) will be.

This is great because it lets me show off some extreme value theory (evt), which is a part of statistics that is very useful, but not commonly known.

The reason evt matters is that we often have reason to worry about “what’s the worst that could happen” or “what would be the consequence of a 99th percentile event”?1 Much of conventional statistics deals with “what’s the average outcome?” and “what’s the probability of encountering a 99th percentile event?” These questions are important too, but we are usually far more affected by the size of extreme events than the size of common events or the frequency of rare events. These are the questions evt was designed to answer.

Question background

In the question, we are given a spreadsheet containing data on past violations. If we copy the most recent year (or so) of them, we get the following plot:


This makes one think the maximum ought to be around 40, or something? But that ignores the fact that in most months, the maximum is far lower than that.

Bootstrapping an initial forecast

What we can do is bootstrap from this to draw random samples of 31 days and see what the maximum is in each. For example, here are 31 random days:


In this case, the maximum is just over 20. We can draw many such randomly simulated months from the original data, and fetch the maximum for each. When we do that 5000 times, we get this sort of density:


This is a funny-looking density, but there’s some truth to it: it’s rare to get a simulated month where the maximum is in the low 30s. You can verify this by studying the original timeseries carefully.

If we convert this density to a distribution, we can create a first, very naïve forecast for the question:


This seems to indicate that in the data, it’s impossible to find a simulated month that has a maximum lower than 8, but also impossible to find one with a maximum higher than 40. Between these, it’s a roughly linear slope. The middle of the distribution has a maximum of around 25.

We know this is incorrect because if we would have looked only a little further back in history, we would have found a month where the maximum was higher than 40 – this is the danger of using naïve approaches to handle extreme values. If we used this for our forecast, we would have wanted to enter a much less confident distribution to handle this problem.

Can we do something more sophisticated that is probably more correct? Yes, we can. Enter extreme value theory.

Stumbling our way into a limit law

What if we did the same bootstrap thing, but we took the maximum of just one day at a time, instead of a full month? Then we’d get the distribution of the underlying data:


Okay, now how about two days at a time? Now we’d get the distribution of the higher of two randomly chosen days:


What about ten days at a time?


If we had more data, we could go on with this process. The density would walk toward the right as we add more data, but! If we normalise the value, it would stop moving, and instead start to gel into a specific shape.

If you have ever seen sums of random numbers converge to the normal distribution thanks to the limit theorem, you’ve seen this sort of convergence happen. This is the same thing, except instead of summing n random numbers and dividing by something, we take the maximum of n random numbers and normalise. We have stumbled into the Fisher–Tippett theorem, which says the maximum of n random values2 If they converge at all. will converge to a specific extreme value distribution.

We can use this when forecasting. We ask the evir R package to do it for us (with the gev function), and we draw the fitted distribution (using pgev) on top of the simulated one from before.


Now we’re talking! The two are similar, but critically for extreme value analysis, the fitted theoretical distribution has a much thicker tail – it allows for more extreme values. This is what we want when we are trying to forecast a maximum. As Nicholas Nassim Taleb quips, here paraphrased,

You cannot use the past maximum as a guide to what the worst case will be, because any maximum is, by definition, a worse event than the worst that came before it. The challenge is finding out how much worse.

In our case, the simulated distribution would have us conclude that a maximum of over 40 was impossible. The fitted theoretical distribution says it happens about 8 % of the time. If we go a little further back in history, we will discover that it is true.

The simulated distribution made it hard for us to estimate things like “what’s the probability we get more than 80 adiz violations on the worst day of a month?” Is it 0.01 %? 1 %? There’s a factor 100 between those two, and that could matter a lot! With the fitted extreme value distribution, we would estimate it as an 0.4 % type of event.

So, then, my forecast on this question is based on the following numbers, extracted from the fitted distribution:

1 % 10 % 25 % 50 % 75 % 90 % 95 %
9 13 16 21 28 37 45

With the endpoints 5 and 65 having 0 % and 98 % of the distribution under them, respectively.