The Problem With Spinning Simple Stories Based on India’s COVID-19 Numbers

22/04/2020

Before COVID-19, was it possible to become a Twitter celebrity by tweeting lists of numbers, often sans context or commentary? It is now. There’s a lot of number crunching going on at the moment. Lists, spreadsheets, graphs and projections are floating about. We occasionally have numbers to clarify but more often than not, the numbers distract, become propaganda and even turn into nonsense.

As an example of numbers whose primary purpose is communal propaganda, consider the frequent claim that 30% of COVID-19 cases in India are linked to the Tablighi Jamaat gathering. Of course, this is not the case by any honest scientific reckoning; my own estimates put the true figure at 1-2% of cases. The logical fallacy of this claim has been repeatedly pointed out: it is like taking a trip into a forest, seeing your first elephant and using this observation to claim that all the world’s elephants live in this particular forest. But nevertheless the claim has widespread currency, no doubt because it is based on a quantitative argument and these apparently carry greater weight.

As an example of numbers as nonsense, consider the many lists of “fatality rates” out there. The figures are calculated by dividing the total reported deaths by the total reported infections in different regions or countries, and then used to make statements sometimes accompanied by lashings of regionalist or nationalist pride. These lists are nonsensical because different places have different testing strategies and protocols and because the disease plays out over multiple days, so comparing today’s total deaths with today’s reported infections is more or less misleading depending on the stage of the epidemic in a given locality. Issues of transparency and censorship also mean deaths may or may not be recorded, among other limiting issues. Yet these lists also provide the COVID-19 commentariat with lots of material for discussion.

A current example of how numbers figure problematically in debates is of various projections telling us that the lockdown is “working”. Sure, the lockdown appears to have reduced the spread of infection, and trying to quantify the degree to which this has happened is not meaningless. However, many projections involve arbitrary choices aimed at shoring up a simple narrative. They add a veneer of legitimacy to data that is otherwise incomplete and unclear, and in so doing distract from the process of trying to understand the underlying dynamics of the disease. They also distract from the realities of lockdown suffering, including impoverishment, hunger, domestic violence and people dying for lack of access to healthcare.

Should we then not be looking at the numbers? I think we should, but as consequences of an underlying story rather than the story itself. Numbers are merely shadows of an interesting, complex and tragic object we can’t fully perceive. They are not the objects themselves. Examining data should be one part of decoding the full story of the pandemic.

This may sound abstract, but trying to reconstruct an object from some prior knowledge and a few shadows is in some sense what most disease modellers do. They begin with what is known about the dynamics and transmission of the disease, try to work this into a crude, first-draft model and start analysing it. Only at this point might real-world infections, hospitalisation and fatality data come in. The modeller then compares the numbers predicted by the model with what we’re observing in the real-world. Then comes the long process of tinkering with the model, returning to – and hopefully questioning – the data, and so on and so forth. Bringing the recorded data and the model’s outputs closer to each other is the most interesting part of the modelling process.

What makes modelling COVID-19 particularly challenging is the subtle interplay between the basic human-virus interaction and the social, cultural, health, political and environmental contexts in which it happens. This is true to various extents for all diseases, but variability and uncertainty seem particularly high for COVID-19 considering its causative virus is new.

The basic dynamics of the interaction between human and virus already holds several mysteries, even before it is overlaid with the realities of overcrowded housing, flows of migrant labour, physical distancing, medical interventions and lockdowns. So the first thing you do as a modeller is decide how much of this complexity you can deal with. Ignoring stuff is a valid choice too, provided it is done transparently.

Despite all these difficulties there are some useful and easily communicable insights from modelling, arising from my own modelling attempts and also from reading various clinical studies and other people’s models.

1. Fatality – Most of the COVID-19 data being collected around the world doesn’t allow us to directly infer the fatality rate. Fatality rate estimates come either from special datasets such as antibody studies from Germany and the US, testing data from the Diamond Princess cruise ship or by using existing data coupled with some indirect measures such as observations of people travelling from a region.

Fatality estimates obtained from these studies are generally much lower than 1%, but there is considerable variation. So fatality rates remain uncertain quantities in modelling work. Bearing this in mind, when we start building models to interpret the Indian data, we arrive at a dichotomy: to simultaneously explain the testing and fatality data, we must either have a high fatality rate and relatively low spread of infection or a high spread of infection and a relatively low fatality rate. There are really no two ways about it.

Whenever government sources say there is “no evidence of community spread”, people should ask, “In that case, we have a terrible fatality rate – why is that?” If, like me, you believe that the COVID-19 fatality rate in India is unlikely to be much higher than 0.5%, you are forced to accept that the disease is quite widespread (with most cases probably being mild or asymptomatic).

2. Time delays – Modelling highlights several phenomena related to time-lags that popular discourse on COVID-19 often misses. Test results and fatalities today are both snapshots of infections acquired in the past. For example, if disease transmission suddenly stopped today, we probably wouldn’t see the effect of this in the data until some weeks later. And as a result of these delays, any sensible model of the disease (without containment measures) will discover a lot of infections by the time the first person has died of the disease. So by the time the 100th person has died, the number of cases can be very large.

Whenever I read COVID-19-related data analyses, the first thing I ask is whether time lags were taken into account. The answer, particularly in analyses for popular consumption, is often an emphatic ‘no’. Getting delays wrong can lead to both under- and overestimates of fatality, and give an incorrect impression of whether one wave of the epidemic is coming to an end or not.

3. Mitigation and containment – The modelling process provides some insight into the effects of mitigation and containment measures. There are many ways to model these effects. For example, we might model quarantining as decreasing the total pool of infective people; physical distancing and hygiene measures as reducing the likelihood of infection events; restrictions on movement as reducing the effective susceptible population; etc.

One consequence of the lockdown is to trap disease in compartments – a household, an apartment complex or (in Mumbai) an urban slum. Because the disease can spread within a compartment quite rapidly, you quickly arrive at a situation with a lot of geographical diversity even within a single city or a single neighbourhood. The mathematical term for this is spatial inhomogeneity: the virus may be spreading fast through one locality without taking off in a neighbouring one. In compartments with disease, the virus may spread until the compartment’s population has achieved herd immunity[footnote]When a significant portion of the population has survived infection and is now immune to the virus, leaving it with no way to spread further.[/footnote].

At the same time, because the disease progresses slowly, herd immunity takes time so policymakers will be cautious about lifting restrictions on movement. In the short term at least, this decision can fundamentally alter the social geography of an area, enhance class segregation, and disrupt social networks and flows of labour. We don’t yet know whether spaces will fully recover from such changes.

* * *

As the pandemic unfolds in India, people handling data in whatever capacity – as modellers or as social-media commentators – should desist from manufacturing simple stories from dodgy data. Interpret the numbers with scepticism and in the context of what journalists and health professionals on the ground are discovering about the pandemic. Help frame the right questions to the government, especially about discrepancies in the data, plans for different scenarios, the purpose and proportionality of different responses, and how officials intend to measure the success of their interventions.

Above all, do not reduce human distress to a graph.

Murad Banaji is a mathematician with an interest in disease modelling. Some of his initial modelling work on the COVID-19 pandemic is available here.