Delhi and Tamil Nadu have superficially similar COVID-19 data sets. This is evident if you plot the recorded infections and fatalities up to May 27 in the same single graph.
The trajectories of the total number of infections recorded are remarkably similar. In both areas, the number of infections grew rapidly before slowing in the first two weeks of April. The same pattern is visible with the fatalities. There are some slight differences, of course. The initial growth in the number of infections in Tamil Nadu seems more rapid than that in Delhi, although data from the early days of the pandemic should be treated with caution because of relatively large random fluctuations.
Both regions have relatively low case-fatality rates at present: 2% in Delhi and 0.7% in Tamil Nadu. This manifests in the graphs as a height offset between recorded infections and fatalities in each case.
But before we conclude that Delhi and Tamil Nadu have similar COVID-19 stories, note that there is mounting evidence that Delhi could be undercounting deaths due to COVID-19. One hospital alone apparently reported only four deaths as being due to COVID-19 when the real figure could have been 103, since February 1. Such underreporting could explain, in part, the curve of Delhi’s fatalities being flattened to whatever extent.
Underreporting – as Mumbai has also done – induces a pattern in the graph that random fluctuations alone are unlikely to account for. So a question arises here: given their similar trajectories, could Tamil Nadu be underreporting its COVID-19 fatalities as well?
Before answering this, let’s examine the data from Delhi more carefully. Using a model, let’s attempt to predict what the fatalities should have been over time based on the number of recorded infections (more details here). The basic assumptions are: (1) a constant case detection rate, i.e. the number of infections recorded is a constant fraction of the total, and (2) a constant infection fatality rate, i.e. the true proportion of infections that cause death doesn’t change over time.
When just this method is applied to Maharashtra’s data, a noticeable discrepancy shows up, and it is hard to explain with anything other than undercounting. Alternative hypotheses do exist but none seem entirely convincing.
As it happens, the method uncovers a similar pattern in Delhi’s data.
The number of infections reported can be reasonably matched by modelling, but there is a marked difference between the number of deaths recorded (red) and the number of deaths that should have been recorded (green), given the two assumptions. This difference begins around April 17 and grows until about May 11; then, as an increasing number of deaths are reported, the gap begins to close again. This latter change was likely prompted by the media turning its spotlight on the Delhi government.
In spite of the “correction” that began around May 11, the model indicates that until May 27, there should have been more than 700 COVID-19 fatalities in Delhi instead of the reported 300 or so. That is, fewer than one in two COVID-19 deaths in Delhi were counted.
When the method is applied to Tamil Nadu’s data, this is the result:
There is a small but noticeable – and ongoing – difference between the number of deaths expected and the number of deaths reported. Specifically, according to this simulation, the model predicts about 230 deaths by May 27 instead of the 130 reported. So the question arises again: is Tamil Nadu underreporting COVID-19 fatalities?
One alternative explanation for the discrepancy is that increased testing is leading to more cases being detected. Could this be happening in one or both places?
Thus far, testing in Delhi and Tamil Nadu appears to have risen to keep pace with, or on occasion even outpace, the virus. Between April 15 and May 27, testing in Delhi increased about 11x while the number of infections reported increased about 10x. As a result, there is a slight drop in the test positivity rate.
In Tamil Nadu, in the same period, testing increased about 20x and the number of infections reported, about 15x, causing a more pronounced drop in the test positivity rate.
So increased testing could explain some of the discrepancy between expected and observed fatalities in both regions.
Let’s take a closer look: by trusting the fatality figures and using them to predict the number of infections recorded (assuming a constant case detection rate). If we adjust the parameter values used to simulate the Tamil Nadu data in order to match the observed fatality data, this is the result:
This time, we match the fatalities reasonably, but because the numbers are quite low, there is quite a bit of random fluctuation between different simulations. However, the number of infections observed (black) exceeds the number of infections that should have been observed (blue). In absolute terms: as of May 27, this difference corresponds to about 70% more recorded infections than expected from the fatality data. Some of this discrepancy could be explained by increased testing, therefore increased case detection.
If we do the same for Delhi by trusting the fatality data, we get:
The number of infections recorded as on May 27 is almost three-times the number predicted by the model. Such a figure would be consistent with a massive increase in case detection, which clearly has not happened. Given that testing in Delhi only marginally outpaced infections, it seems quite unlikely that the case detection rate has improved so much.
So what, if anything, can we say about case detection in the two regions? To estimate an absolute value for the case detection rate, we need to assume an infection fatality rate (IFR). For example, if we assumed an IFR of 0.5% for both Delhi and Tamil Nadu, then in Tamil Nadu, about 16% of cases will have been detected by testing and in Delhi, about 5%, both to date.
It’s quite possible that the relatively high case detection rate together with contact tracing and isolation could have helped Tamil Nadu control its COVID-19 outbreak so far.
In sum, there appears to be no convincing evidence that Tamil Nadu is underreporting COVID-19 deaths. We should always assume that some COVID-19 deaths are likely passing under the radar, but there is no clear evidence of systematic undercounting. If it is indeed happening, it is happening to a smaller extent than in Delhi.
A final note. Some commentators seem to believe that setting up competitions between states or regions around their COVID-19 numbers could be a good way to encourage regions to improve their practices. That is not the intention here. If anything, naïve comparisons without clear accompanying stories can encourage readers to misread the numbers. Only carefully examining similarities and differences between COVID-19 numbers in different regions, together with all other information available, can throw up meaningful questions.
All the data is from COVID19 India.
Murad Banaji is a mathematician with an interest in disease modelling.