More On Response Time And Scale

kqr

Tags:

In a previous article, we learned that response time is the system talking. However, what we did in that article rested on one critical assumption: we were able to measure the response time at complete idle. We managed to measure a response time of 89 ms, and we assumed this is the response time at complete idle. What if it wasn’t?

Maybe the system is lightly loaded also at night, and this was a measurement under load. In that case, we will underestimate the utilisation of any load we apply later.

This article is currently WRONG

Reader Ryan R (who writes at The Absent-Minded Android) pointed out to me that assuming \(R_A = 2R_B\) is only valid if the load is applied to an idle system – the exact assumption we are trying to work around.

To solve this, one actually needs three load tests. As soon as I can take the time I will rewrite this article to be correct. Until then, please look elsewhere to satisfy your load testing theory needs!

We can extrapolate into baseline response time

If we are unsure about our baseline measurement, we can, based on two brief load tests, extrapolate backward to complete idle.

In the previous article, we learned about the equation

\[R = 1 - \frac{W_0}{W_A}\]

where \(R\) is system utilisation, computed from \(W_0\) which is response time at complete idle, and \(W_A\) which is response time under that utilisation. If we don’t know \(W_0\), it might seem like we’re in trouble. Unfortunately, maths is there to save us!

When we run two load tests at slightly different loads, we will get two response time measurements \(W_A\) and \(W_B\). These will be associated with two utilisation levels \(R_A\) and \(R_B\). We won’t know what \(R_A\) and \(R_B\) is1¹ That’s part of what we are trying to find out, after all., but we will know how much more load is applied by A than B, relatively speaking.

The load test we ran in the previous article showed a mean response time of 327 ms – this will be test A. Then we backed off a little and applied only half as much load for test B, giving a response time of 122 ms.

Now we have two equations forming a system. The first load test gave us

\[R_A = 1 - \frac{327}{W_0}\]

and the second gave us

\[\frac{R_A}{2} = 1 - \frac{122}{W_0}\]

This is two equations with two unknowns. When we solve these equations simultaneously we learn that \(W_0\) = 75 ms. In other words, the response time at complete idle was 14 ms lower than the one we measured.

There are some fairly obvious consequences to this:

The 89 ms we measured were actually at a utilisation of 16 % – quite far from idle!
The first load test we did was driving the system to a utilisation of 77 % rather than the 73 % we had thought. This might sound like a small difference, but it means the spare resources (our safety margin) were 15 % lower than we thought they would be.
To drive this system to a utilisation of 40 %, we should probably not exceed a request rate of 0.4 × 1/0.075 = 5.3 requests/second! This is significantly lower than my intuition tells me.