P95 & P99 - new metrics in Uptime-check 🔎

We've added P99 and P95 to our Uptime-check report, which illustrates the 99th and 95th percentile response time.

No idea what you just read? No worries - neither did I at first, but my colleague Michael sat me down and was like:

And now it makes total sense. So let me explain it.

We measure the response time for our Uptime-check by looking at the average value of all the tests we perform in a given period.

For example, if we have performed 100 tests, we add all these together and divide by 100 to get the mean average.

Averages are fantastic for getting a quick overview of the performance - however, they can be misleading.

Say that we perform 100 tests. 95 of these have "a normal" response time of 1 second, whereas the last 5 of them "times out" and give us a response time of 15 seconds.

Now our average response time suddenly has increased from our "normal" 1 second to 1,70 seconds (a 70% increase) - all because of 5 failed tests.

This gives us a wrong picture of how the average visitor experience the response time - since the vast majority of the visitors (95%) have a response time of just 1 second. 

It also doesn't tell us exactly how bad the experience is for the last 5% with a whopping response time of 15 seconds or above (we cut the connection after 15 seconds). 

Using percentiles helps us catch these outliners.

So how does it work?

It works in the same way as a median value. A median value (which we have also added to our metrics) gives us the P50 value.

Say we have performed 100 tests like the previous example. We then sort these values from low to high. If we're looking for the median, we take the value from the 50th percentile (in this case, it would be value #50).

In P95 and P99, we take the 95th and 99th percentile, which in this example is value #95 and value #99. If we had performed 1.000 tests instead of 100, this would be value #950 and value #990.

So thanks to these new metrics, you can now more easily catch the outliners which screw up your averages and, more importantly, give your visitors a bad experience.

Pretty cool, right? Or to quote Owen Wilson: