If you follow me on Twitter, you'll know I'm constantly going on about how the number of COVID-19 cases is not a very useful indicator of anything - unless you also know something about how tests are being conducted.

If you're a regular reader of FiveThirtyEight, you're probably used to looking at data in sports - where basically everything that happens on a basketball court or a baseball diamond is recorded - or in electoral politics, when polls ( in theory, anyway) survey a random sample of the population. COVID-19 statistics, especially the number of reported cases, are not at all like that. The data, at best, is highly incomplete, and often the tip of the iceberg for much larger problems. And data on tests and the number of reported cases is highly nonrandom. In many parts of the world today, health authorities are still trying to triage the situation with a limited number of tests available. Their goal in testing is often to allocate scarce medical care to the patients who most need it - rather than to create a comprehensive dataset for epidemiologists and statisticians to study.

But if you're not accounting for testing patterns, it can throw your conclusions entirely out of whack. You don't just run the risk of being a little bit wrong: Your analysis could be off by an order of magnitude. Or even worse, you might be led in the opposite direction of what is actually happening. A country where the case count is increasing because it's doing more testing, for instance, might actually be getting its epidemic under control. Alternatively, in a country where the reported number of new cases is declining, the situation could actually be getting worse, either because its system is too overwhelmed to do adequate testing or because it's ramping down on testing for PR reasons.

Failure to account for testing strategies can also render comparisons between states and countries meaningless. According to two recent epidemiological studies, which tried to infer the true number of infected people from the reported number of deaths, there is roughly a 20-fold difference in case detection rates between the countries that are doing the best job of it, such as Norway and the worst job, such as the United Kingdom. (The United States is probably somewhere in the middle of the pack by this standard.) That means, for example, that in one country that reports 1,000 COVID-19 cases, there could actually be 5,000 infected people, and in another country that reports 1,000 cases, there might be 100,000!

There is also a lot of uncertainty about the true numbers of infections within a given country. According to an expert survey published by FiveThirtyEight, the number of detected cases in the United States could underestimate the true number of infected people by anywhere from a multiple of two times to 100 times. The same holds in other countries. A recent paper published by Imperial College London estimated that the true number of people who had been infected with the coronavirus in the U.K. as of March 30 was somewhere between 800,000 and 3.7 million - as compared to a reported case count through that date of just 22,141.

So in this article, I'm going to work through four examples of how various testing strategies can skew case counts, in the hopes of giving you a more hands-on sense for how the mechanics behind the numbers work. These scenarios are definitely not meant as predictions of what will happen in any given country, state or region. They work with hypothetical data, because we don't know all the parameters we'd need to properly estimate a model anyway. The goal is just to illustrate, given relatively simple assumptions, how reported case counts for a disease can differ from the actual number of infections.

At the same time, the parameters in each scenario reflect what I hope are semi-realistic assumptions that at least loosely approximate the coronavirus situation in different groups of countries. Some countries have relatively robust testing. Some started out with strong testing but then stalled out. Some were way behind on testing but soon caught up. Each of these can have different effects on the pattern of reported cases.

You can even download an Excel spreadsheet and input your own assumptions - though I'm going to wait until the end of the story to give you the link, in the hopes that you'll continue reading about how this all works before trying to brew up your own scenario.

The not-so-simple math behind coronavirus testing

The core purpose of this exercise is to help you think through how many people might test positive for a disease based on how many people are actually infected with it, given various assumptions about testing. That does require us to make some simple assumptions about the underlying number of infected people in the population. So the scenarios are partly based on what should be a fairly simple, standard epidemiological model.

The most important number in any epidemiological model is R, or the reproduction ratio, which is how many people that a person in one generation passes the disease along to in the next generation. For example, if a disease has an R of 3, that means each infected person transmits it to three more people. So one initial case becomes three newly infected people in the next generation, which becomes nine people, which becomes 27 people, which becomes 81 people, and so forth - the very nature of exponential growth is that it gets out of hand quickly!

Assumptions about the R of COVID-19 vary, and to some extent that's inevitable given that there isn't necessarily one intrinsic number for how the disease spreads from one infected person to the next. In fact, epidemiologists make a distinction between R0 (pronounced R-zero or R-naught), which is called the basic reproduction ratio, or how fast the disease spreads in the absence of any interventions or any immunity whatsoever, and the effective reproduction ratio, called R-effective or simply R. R-effective is likely to be much higher on a cruise ship or in a college dormitory than in the middle of a remote town in Alaska where people rarely encounter one another, for example. Moreover, interventions such as social distancing are being undertaken to bring down R, although actions can vary from location to location. The goal, though, is to get R below 1, which means that a disease begins to die out in a population. (It will die out gradually if R is close to 1 and quickly if it's close to zero, say, 0.2.) Finally, if a disease has spread very widely throughout the population, R may eventually fall because of herd immunity. In other words, enough people are immune to a disease because they've already had it, it will not continue to spread as fast.

So in these scenarios, I assume that R goes through three different stages that reflect various efforts at containment:

    t
  • First, there's an uncontrolled stage where the disease is spreading unchecked throughout the population. I assume this stage has an R of 2.6. The WHO initially estimated R to be between 2.0 and 2.5, but other researchers such as those from Imperial College London have since revised their numbers upward to around 3.0; thus, 2.6 reflects something of a middle ground.
  • t
  • Next, there's an intermediate stage where some measures are being undertaken - businesses are having their employees work from home, large events are cancelled, and people are avoiding some unnecessary contacts and generally being more careful. But, there are no lockdown or quarantine measures in place. I assume that R falls to 1.4 during this stage.
  • t
  • Finally, there's a lockdown stage where R falls to 0.7 - or below one, meaning that the disease begins to die out.

There's a lot of disagreement about these values - both how fast COVID-19 was spreading initially and how effective various interventions have been at lowering R. So you are welcome to download the spreadsheet at the end of this article and tweak those assumptions. (Note that the scenarios also account for R gradually reducing over time because of herd immunity, so the actual values of R in the scenarios may be slightly lower than the ones stated above.)

Next: How long does a generation last? By a generation, I don't mean the Baby Boomers or something like that - I mean one round of infections. The number that determines the length of a generation is the serial interval, which is how long it takes, on average, for a person to transmit the disease to the people he infects. For COVID-19, estimates of the serial interval hover between four to five days. So I assume that a generation lasts five days in the scenarios.

I also assume that the disease has varying levels of severity, and that this affects whether people are tested. In particular, I assume that:

    t
  • 10 percent of cases are severe.
  • t
  • 60 percent of cases are mild.
  • t
  • And 30 percent of cases are asymptomatic.

Again, this seems to match the consensus of the medical literature on COVID-19 ... but there is a lot of disagreement about these parameters - and especially on the number of asymptomatic cases. So I'd welcome you to input different values and see how they affect the results.

However, in considering who gets tested, we also need to think about people who have symptoms similar to those of COVID-19 but who don't actually have the coronavirus. I haven't seen much research on this question, but thermometer data seems to find that around 3 percent of the U.S. population typically feels sick at this point in the year. So the scenarios assume that at any given time, 0.1 percent of the population has symptoms that resemble severe COVID-19 symptoms for reasons other than coronavirus (say, a bad flu or bronchitis or pneumonia), and that 2.5 percent have symptoms that resemble mild COVID-19 symptoms for other reasons than coronavirus (say, a mild flu or a bad cold). Furthermore, I assume that all people with severe symptoms seek testing (that is, they would get tested if they could), that half of people with mild symptoms do and that 2 percent of asymptomatic people do. All of these assumptions can also be changed in the spreadsheet.

But wait, there's more! I don't really consider the scenarios a "model" in the way FiveThirtyEight usually uses that term because we're not trying to predict anything, we're just trying to show how different testing strategies can impact the number of reported cases. But, as in the case of real coronavirus models, there are an awful lot of messy, real-world problems we need to consider, too.

One of them is that there's a long lag between when someone is infected, when they develop symptoms, when they get tested and when those test results are reported. In Wuhan, China, the lag between the development of symptoms and test results being reported was around 10 to 12 days. And considering it usually takes at least a few days for symptoms to develop, the lag between infection and a case showing up in the test statistics is going to be longer still. In these scenarios, I therefore assume that there's a delay of 15 days (or three generations) between infection and the test results showing up in the data - though if anything I suspect this is too generous, given the huge testing bottlenecks in places such as California.

Another real-world problem is that the tests aren't perfect. In fact, according to reporting by The Wall Street Journal on Thursday, around 30 percent of people who actually have COVID-19 test negative for it - which is what we'd call a false negative. Other estimates of false negatives aren't quite so high, so I assume a 20 percent false negative rate in the scenarios.

Then, of course, there's also the question of false positives, i.e., when a test reports that someone has COVID-19 but they actually don't. This number is harder to pin down, but we can infer that tests rarely produce false positives. Why? In Iceland, where large numbers of asymptomatic people are being tested, the overall rate of positive tests among this group is slightly under 1 percent. Given that includes people who probably do have the coronavirus (since asymptomatic cases are fairly common), we can assume the rate of false positives is even lower - for the purpose of the scenarios, we'll guess that it's 0.2 percent.

However, there's a bit of a mathematical twist in calculating false positives. Even if false positives are rare, false positives may swamp true positives if the underlying incidence of a disease is low. Say, for instance, that in the early stages of an outbreak in a town of 100,000 people, 100 people or 0.1 percent of the population actually has the disease. If everyone gets tested, then there will be roughly 200 false positives (0.2 percent of the population) - larger than the number of people who are actually sick! This is why some of the discourse around false positive tests is confusing. It can both be true that the rate of false positives is fairly low and that a high share of positive tests are false. For better or worse, this becomes less of an issue as new infections multiply; there are lots of real positives, so they no longer drown out the false positives.

Finally, there are a further set of assumptions we have to make about how many tests are conducted and who gets tested. But those vary from scenario to scenario. So let's work through the scenarios now:

Scenario 1: Robust growth in testing

To emphasize that these are hypothetical scenarios and to get you in an appropriately abstract mindset, I'm going to ask you to imagine that these scenarios occur in a country called Covidia, which has 10 million people and where the first infected person entered the country on Jan. 1 (although his case wasn't detected until later).

In this first scenario, Covidia - like most real-world countries - is a little slow to undertake social distancing measures. They take some intermediate steps on March 1, by which time 183,000 people there have already been infected, though far fewer positive tests (just 439!) have been reported. On March 16, with the number of cases still rapidly increasing, Covidia implements a full stay-at-home order (what I'm informally calling a "lockdown"), which reduces R to less than one.

In better news, the testing situation is comparatively good in this version of Covidia. In this scenario, I assume that Covidia starts out with the capacity to do 1,000 tests per generation, and beginning in early February, it improves testing volume by 50 percent per generation until all testing demand is satisfied. I further assume that Covidia rations 75 percent of tests, meaning that tests go to people with severe symptoms before people with mild symptoms, and to people with mild symptoms before people with no symptoms. The remaining 25 percent of tests are available on an on-demand basis.

So, what does this look like? Here is how the actual number of infections compares to the number of reported cases in Covidia, first in table form...

...and then in chart form:

There are quite a few things to look at here. The most obvious and probably the most important one is simply that a 15-day delay between when someone gets infected and when their case shows up in the data as a positive test makes a huge difference. Even if everything else was going perfectly - 100 percent of the population was being tested and the tests are 100 percent accurate - with an R of 2.6, a 15-day delay would result in there being about 18 times more newly infected people in the population than the number of newly reported positive tests at any given time.

The delay matters less as R declines because if the disease isn't growing as fast, there aren't as many new people who get infected in the 15-day period between infection and test results. But it still means we're always looking two weeks into the past whenever "new" data is reported. And remember, social distancing measures that are effective in flattening the curve may take two or three weeks to show up in the data. This is especially so when the demand for testing is near its peak and there are likely to be longer lags in processing test results.

Next, even with relatively good testing, you're still likely to miss many cases. By the end of the scenario on June 29, 1.2 million people have been infected at some point in Covidia, but there are only 186,000 detected cases, for a detection rate of about 16 percent (and some of those are false positives so the actual situation is a bit worse than that). What accounts for the missing cases?

    t
  • First, we assume that many people with mild symptoms or no symptoms do not want to get tested (and nobody forces them to get tested) so they get infected at some point without ever realizing it.
  • t
  • Next, the 20 percent false negative rate means that some cases are missed.
  • t
  • Finally, even where testing ramps up quickly, it may not ramp up quite as fast as the disease itself. The actual peak in new infections in this scenario comes on March 16 - and at that point, testing is not fully scaled up and a lot of people who would like a test still cannot get one.

There's also a third issue: If testing is increasing, the rate of growth of a disease can be overestimated. Alternatively, if testing is stagnant or decreasing, the rate of growth can be underestimated. Note that in Scenario 1, the R you'd infer from the number of reported cases peaks at 3.5, when the actual R based on infections was not quite as high (2.6 before Covidia began implementing social distancing measures).

In other words, the rapid rates of growth in new cases you can see in a country (say, Germany) when it first gets serious about testing are both a function of the number of tests increasing and the number of infections increasing - and it's hard to tell what's what. You can also have problems if there's a sudden, one-off increase in tests, as we'll see in the next scenario.

A final issue - I'm not going to boldface it because it's less important than the others - is that in the late stages of the scenario when there is little disease transmission following a prolonged lockdown, many of the newly detected "cases" are false positives. As I mentioned, false positives can be an issue to contend with when the overall incidence of disease in a population is low. They are not the greatest concern in the U.S. or Europe right now, when we're still in the peak of the pandemic.

Scenario 2: Sudden, one-time increase in testing

What would a more rapid increase in testing look like? In Scenario 2, I'm leaving all the settings from Scenario 1 unchanged - except for the number of tests. In this new scenario, I assume that Covidia starts out with the capacity to conduct only 100 tests per generation, but then goes on a crash program in February and rapidly increases that number at a rate of 200 percent per generation until it maxes out at 100,000 tests about a month later. This is similar to the situation in the United States, where testing started out slow, improved rapidly and has now stalled out again.

In this scenario, the distortions between the number of infections and the number of people who test positive are more profound. Even though the actual R is "only" 2.6 in the early stages - still a very scary, high number by the way - it will briefly appear to be as high as 7.8 if you're looking at the number of newly detected cases because test capacity is scaling up so rapidly.

And on a graph, the slope will look extremely steep for a few weeks. You might be tempted to look at a graph like this and say that Covidia is on a much worse trajectory than other countries:

But that isn't getting the story right. What really happened was: Covidia was way behind on testing and it's playing catch-up, which means that the number of reported cases will increase at very fast rates until it does catch up. But the actual number of infections at any given time is the same as in Scenario 1. That doesn't mean the news in Scenario 2 is good, exactly. It means Covidia had a very big COVID-19 problem all along that wasn't being detected until very recently, but it is now finally starting to get its arms around it.

Next, let's look at the opposite case: Where testing starts out reasonably strong, but doesn't scale up very much.

Scenario 3: High test floor, low test ceiling

In this scenario, I assume that Covidia starts out with up to 10,000 tests available per generation. However, it scales up tests very slowly (by just 3 percent per generation) before eventually capping out at 20,000 tests per generation. Furthermore, 100 percent of tests are reserved for symptomatic individuals and there are no on-demand tests available. This situation is broadly analogous to some European countries with centralized, socialized health care systems. For instance, the U.K. has tested only about 160,000 people total as of Apr. 2, or an average of only about 7,000 tests per day since March 19.

In this case, the number of cases is substantially underestimated because there aren't enough tests at the peak of the epidemic. Only about 5 percent of infections are eventually detected.

But not only is the number of cases underestimated - the rate of increase will also be underestimated. For instance, in this scenario, R appears to peak in the low-to-mid 2's when it's actually 2.6. This is because the rate of new cases is increasing faster than the country's ability to detect them, even if the country rations as many of the tests as it can so it can test the sickest individuals, as we assume that it does. During the peak of the epidemic in Scenario 3, as many as 58 percent of newly reported tests will produce positive results, which resembles the extremely high rate of reported positives for periods of time in places such as Lombardy, Italy.

So while the slope of reported new cases in Scenario 3 might look more gentle than in Scenario 1 or Scenario 2...

...the country's situation is actually just as bad. And in some respects it might be worse. Having conducted so few tests, Covidia will miss some extremely ill people in Scenario 3 and they may die as a result of not being able to get medical care soon enough.

That's not the worst-case scenario, though. Imagine if a country, recognizing that the media tends to fixate on case counts, decides that it can make things look superficially better by decreasing the number of tests that it does.

Scenario 4: A testing decrease

Let's say that, as in Scenario 3, Covidia starts out with the capacity to conduct 10,000 tests per generation. However, as the case count begins to accelerate, the government panics that the outbreak will make it look bad, so it subtly starts scaling down testing capacity by 20 percent per generation in early March. Meanwhile, being in denial about the scale of the problem, it doesn't implement a full lockdown until April 10, or more than three weeks later than under the other scenarios.

I should note that this is more of a "thought experiment" than the other three scenarios. Countries from China to Russia to Iran have been accused of publishing unreliable official statistics - but I don't know whether any countries are deliberately limiting testing to keep case counts down.

Nonetheless, a situation like this obviously turns out quite badly. Almost 30 percent of the country eventually gets infected:

Would the country really wait so long to implement a lockdown? Well, if it wasn't doing enough testing, it might. Because Covidia is reducing the number of tests in this scenario, the number of reported new cases will appear to peak on March 16, even though the actual peak of new infections won't come until April 5. It will appear that the partial measures it undertook worked, when really they weren't working well enough:

In an instance like this, having information on the number of tests would be quite useful, as a high rate of positive tests could be a sign that you're only seeing the tip of the iceberg. During the peak of the outbreak in April, as many as 64 percent of tests would return positive results in Scenario 4, the number would be as high as 80 percent if not for false negatives.

I already gave away the conclusion at the top of the story, so I'm just going to repeat it once more, hoping that this article has helped to convince you of it: The number of reported COVID-19 cases is not a very useful indicator of anything unless you also know something about how tests are being conducted.

In fact, in some cases, places with lower nominal case counts may actually be worse off. In general, a high number of tests is associated with a more robust medical infrastructure and a more adept government response to the coronavirus. The countries that are doing a lot of testing also tend to have low fatality rates - not just low case fatality rates (how many people die as a fraction of known cases) but also lower rates of death as a share of the overall population. Germany, for example, which is conducting about 50,000 tests per day - seven times more than the U.K. - has more than twice as many reported cases as the U.K., but they've also had only about one-third as many deaths.

Put another way: Doing more tests is good, and likely leads to better long-run outcomes, even if it also results in higher case counts that people will freak out about in the short run. I don't usually like to be so didactic, but I hope you'll be a more educated consumer of COVID-19 data instead of just looking at case counts ticking upward on cable news screens without context. That context includes not only reporting about the amount of testing, but also indications such as hospital strain, which are more robust since they aren't subject to as many vagaries about how tests are conducted. Even if you're not from New York, Gov. Andrew Cuomo's daily briefings are worth watching because they do the best job I've seen of providing this context.

And if you do want to play with your own scenarios to see how all of this works... here's the link to that Excel sheet. Have fun, but keep in mind that even though there are a lot of parameters you can tweak, the scenarios are still a fairly crude simplification of the complex situation on the ground in any given state or country.

CORRECTION (April 13, 2020, 1:47 p.m.): In describing what it would take for the new coronavirus to begin to die out, a previous version of this story incorrectly referred to the R value falling below zero, which is not possible. The correct threshold is an R below 1.

tag