*Image: geralt/pixabay.*

It’s essential but utterly difficult to estimate the case fatality rate (CFR) of an ongoing pandemic. The data is continuously updated and we constantly receive new knowledge of the disease, especially if it’s a new disease caused by a new virus.

On March 3, the WHO director-general Dr Tedros Adhanom Ghebreyesus said that about 3.4% of people reported to have COVID-19 around the world have died. However, the estimated fatality rates in different countries – or the world as a whole – could change dramatically once the pandemic ends and the dust settles.

We usually have data on the number of cases, called C(*t*); the number of deaths, D(*t*); and the number of people who have recovered, R(*t*) – where ‘*t*‘ is the time over which we’re measuring these numbers. The CFR is the proportion of people who eventually die of the disease. Once the pandemic ends, the CFR will be equal to D divided by C. But while the pandemic is still underway, the formula CFR = D(*t*)/C(*t*) could be misleading because the final outcome of many patients is still unknown, and we’re assuming that none of them will die eventually.

This in turn could have serious consequences. A research article published in the *American Journal of Epidemiology* in September 2005 discussed this problem. For example, when the 2003 SARS epidemic was still ongoing, the WHO reported in April 2003 that “data … indicates that 96% of persons developing SARS recover spontaneously. The focus now is on the roughly 4% who are dying.”

The estimated CFR of SARS, obtained by dividing D(*t*) by C(*t*), was 3-5% in the first few weeks of the global outbreak. But when researchers used the appropriate statistical techniques and accounted for the discrepancy between the cohorts of D(*t*) and C(*t*), the estimated CFR jumped to 6-8%.

Today, long after the outbreak ended, we know that SARS’s final death rate was 9.55% (774 of the 8,098 people who had the infection died). Such a shift in the estimates has nothing to do with the severity of the disease.

The September 2005 article suggested estimating the CFR by dividing D(*t*) by D(*t*) + R(*t*) – that is, the number of people who died by the number of people who died *or* recovered. However, with the novel coronavirus, many individuals who eventually die are not excluded from the cohort of those still under treatment until later.

So an alternative way to estimate the CFR is to calculate the number of people who have died as a fraction of the number of people who had the infection *T* days earlier. Here, *T* is the average number of days from case confirmation to death.

As on May 4, there had been 44,237 cases in India, 1,513 had died and 12,235 had recovered. If we assume that *T* = 7 days, we note that there had been 29,458 cases on April 27. Thus, the three estimates of CFR for India at the moment are:

**i.** ‘First’ way to measure: D(May 4)/C(May 4) = 3.42%

**ii.** The research article’s way: D(May 4)/[D(May 4) + R(May 4)] = 11.01%

**iii.** The alternative way: D(May 4)/C(April 27) = 5.14%

These estimates vary considerably, and we can know which one is closer to reality only once the pandemic is over.

In my opinion, the actual CFR of COVID-19 could be lower. To calculate the CFR, we need to know how many people were infected – and this figure hasn’t been easy to pin down with the novel coronavirus. We already know that there are many people with mild symptoms as well as no symptoms at all. These two ‘truths’ have become important considerations for governments to design effective distancing and quarantining strategies. We should also account for them when calculating the CFR.

Mild or asymptomatic patients could considerably enhance the C(*t*) but leave D(*t*) unchanged, so the CFR for COVID-19 will decrease considerably.

This said, it’s not easy to determine the extent to which the *actual* C(*t*) may be greater than the *observed* C(*t*). We will need to conduct a survey for this purpose.

Neil Ferguson, a public health expert at Imperial College London, told *The Guardian* on January 26 that his “best guess” was that 100,000 people could have been infected by the virus at the time even though there were only 2,000 *confirmed* cases. If this sounds weird, consider the recent example of Sweden.

It’s well-known that Sweden didn’t impose a lockdown like many other countries, and instead rolled out voluntary measures. The Swedish government advised older people and others particularly vulnerable to the virus to avoid social contact; recommended people work from home, wash their hands regularly and avoid nonessential travel; and kept the country’s borders, some schools and most businesses open. On April 22, Anders Tegnell, the chief epidemiologist of Sweden’s public health agency, said that about 20% of the people of Stockholm had developed immunity (through infection) to the virus by then and that the city would achieve herd immunity in a few weeks.

Five days later, Sweden’s ambassador to the US said that about 30% of people in Stockholm had achieved immunity, and that there would be herd immunity in May.

The population of Sweden is around 10 million. About a fourth – 2.5 million – live in Stockholm. So by April 22, about 500,000 Stockholmers had COVID-19. However, according to the daily cumulative data, only 5,071 Stockholmers had positive by April 21. So an astonishing 100-times the number of people who tested positive actually had COVID-19. These people must have been mostly asymptomatic.

Of course, this does not mean the actual number of cases is 100-times the number of those who have tested positive in other countries as well. Sweden deliberately instigated controlled social-mixing to achieve herd immunity. This ‘factor of amplification’ also depends on the rate of testing in a country. Iceland, for example, tested about 6% of its population, so it can’t expect the total number of cases within its borders to be 100-times the number of those who tested positive.

However, for countries that have tested a smaller fraction of their populations, the number of people with COVID-19 (most of them asymptomatic) could be higher than the number that tested positive. Even other factors like high population density, social discipline and adherence to healthcare protocols could affect the actual C(t) figure.

Thus, although it’s nearly impossible to determine the exact number of people in India already affected by COVID-19, it shouldn’t be surprising if that number turns out to be 10-100-times the number of cases detected thus far – most of them asymptomatic many of whom would have recovered. And if that is so, the CFR’s denominator should use that *estimated* C(*t*) instead of the *observed* C(*t*). This should be true for other countries as well. Ultimately, the CFR of COVID-19 may not be 3.4/100 but more like 3.4/1,000 or even 3.4/10,000.

*Atanu Biswas is a professor of statistics at the Indian Statistical Institute, Kolkata.*