ICMR director-general Balram Bhargava at an event in November 2019. Photo: ICMR/Facebook.
New Delhi: September 10. Fully 14 weeks after its first national seroprevalence survey to estimate the countrywide prevalence of COVID-19 was completed, the Indian Council of Medical Research (ICMR) had its paper on the exercise published on the website of the Indian Journal of Medical Research.
In these 14 weeks, experts, journalists and professional health bodies had raised a lot of questions about the extraordinary delay – especially given the rapid spread of COVID-19 in the country. Neither the ICMR nor the government had a convincing response.
This is despite the fact that ICMR director-general Balram Bhargava had held a press briefing in which he had shared a few details about the survey, including a table (reproduced below as table 1), which ostensibly included the interim findings, on June 11 – barely a week after the survey had concluded.
Bhargava had said then that ICMR researchers had surveyed individuals in 83 districts across the country, in accordance with a predetermined protocol. Of these districts, Bhargava said the researchers had completed their analyses for 65. And this analysis, according to him, had found that 0.73% of the population (in these districts) was seropositive for immunoglobulin G (IgG) antibodies to the virus. This result indicated the prevalence of infection in the surveyed population in the recent past.
(IgG antibodies form about 7-15 days after infection and can last for a few months.)
Reacting to media reports, ICMR subsequently issued a clarification that it wouldn’t be proper to extrapolate the 0.73% seroprevalence to India’s population of 1.3 billion because the analysis wasn’t yet complete.
However, the ’83 districts’ figure was somewhat surprising. An ICMR press release dated May 12 – a day after the survey had begun, according to the published paper – had said researchers would survey 69 districts in 21 states.
A count of numbers in the table Bhargava shared during his press briefing on June 11 confuses the picture further. The number of districts in the table add up to only 71 – which was 12 less than Bhargava’s claim and two more than the 69 announced.
Indeed, a closer look at the table reveals other discrepancies, with potent implications.
A footnote to the table clearly states that the numbers therein pertain only to 65 districts, whose analysis had been completed until then, and the classification of districts in three strata (of ‘zero’, ‘low’ and ‘high’ incidence) was based on official data as on April 25.
Question 1 – The fixed seroprevalence
But hang on! According to the survey design and protocol, 400 individuals from each district – one from each household – were to be chosen to check for IgG antibodies. According to the data provided by Bhargava at his June 11 press briefing, the number of individuals whose test data had already been analysed as on that day was 26,400. This corresponds to 66 districts (66*400), and not 65 districts as had been stated.
According to the published results of the final analysis (table 2, below), the total number of surveyed districts was 70 – not 69, 71 or 83 – and specifies the total number of individuals surveyed to be 28,000. These figures are in line with the protocol – to sample a maximum of 400 people per district (70*400 = 28,000).
The difference of 1,600 in the number of individuals whose data was included in the final analysis also ties up with only four additional districts, not five.
While questions about the different numbers of districts that had been mentioned earlier – 69, 71 and 83 – still stand, there’s another intrigue. Even though the total number of individuals surveyed increased in the final analysis from 26,400 to 28,000 (i.e. an increase of about 6%), the seroprevalence remained fixed at 0.73% instead of changing according to the results of the 1,600 samples. The chances that this figure would stay fixed despite the dataset expanding, and up to two decimals, is extremely low. What is going on? This is question no. 1 – and an important one.
Also read: COVID-19: ICMR Says It Mistakenly Inflated Accuracy of Its Antibody Test Kits
Question 2 – The missing district
Comparing tables 1 and 2 shows that one district from the ‘high incidence’ stratum has been omitted, to bring the total to 70 (instead of 71). What is the reason for this?
Further, if you compare the 69 districts listed in the May 12 press release and the 70 districts listed in the published paper, even more discrepancies pop up.
First, the total number of districts listed in the paper is effectively only 69. This is because Alipurduar in West Bengal has been included twice, once in the ‘zero incidence’ group and then in the ‘low incidence’ group. This could be a typo, although we don’t know the actual group in which Alipurduar belongs. If you removed Alipurduar from either of these groups, the number of districts in that group will be one less than the numbers provided in the published table. So which district has been omitted in the published paper? This is question no. 2.
Question 3 – The district mix-up
Four districts from the original list of 69, from May 12 – Kalaburagi (Karnataka), Pauri Garhwal (Uttarakhand), Jalore (Rajasthan) and Kolkata (West Bengal) were substituted in the survey with Gulbarga (Karnataka), Jalor Garhwal (Uttarakhand), Ludhiana and Patiala (both in Punjab).
Jalore being left out is curious because it had reported its first COVID-19 case on May 6. On the other hand, dropping Pauri Garhwal and including Jalor Garhwal is also curious because, according to news reports, Pauri Garhwal had a few cases by May but Jalor Garhwal had reported none by then.
So did the survey include Jalor Garhwal in Uttarakhand instead of Jalore in Rajasthan by mistake? This would seem to be the case because Jalor Garhwal has indeed been included in the ‘low incidence’ group in the published list in spite of having had zero cases in May. So what are the reasons for such changes in the districts finally surveyed? This is question no. 3.
Question 4 – The ‘containment zone’ analysis
At his June 11 press briefing, Bhargava had said the survey had two parts: (1) to estimate the fraction of the general population that had been infected by the virus, and (2) to estimate the fraction of the population that had been infected in ‘containment zones’ in hotspot cities. He further said, “infection in containment zones had been found to be high with significant variations,” and that this part of the survey was still underway on June 11. However, he didn’t specify the number of containment zones or the number of hotspot cities.
Now, the results of this part of the survey are completely missing from the published paper. Why? The analysis should surely have been completed by now. This question no. 4.
Question 5 – The number of ‘hotspot cities’
The list of 70 districts published in the final paper includes Bengaluru and Chennai in the ‘high incidence’ group – but not any of the other urban centres that had been identified in May as ‘hotspots’. This may be because, at the time of selecting districts based on April 25 data, Chennai and Bengaluru probably hadn’t yet been identified as ‘hotspots’. However, by the time the survey kicked off on May 11, the two had in fact been designated red zones or ‘hotspot cities’.
This exclusion is rendered more curious by the fact that Kolkata had been removed from the original list but was reclassified as a ‘hotspot city’ in the final list. One somewhat tenuous explanation is that Kolkata had probably been classified a ‘red zone’ much sooner after April 25 than Chennai or Bengaluru.
A report on June 8 in The Telegraph – three days before Bhargava’s press briefing – had cited unnamed sources to suggest the survey had detected seropositivity rates of up to 30% in the ‘hotspot cities’. ICMR rejected this claim in a tweet on the next day, saying The Telegraph‘s report was speculative because the full analysis was yet to finish.
The findings appeared in media related to ICMR Sero Survey for COVID-19 are speculative and survey results yet to be finalised. #IndiaFightsCorona @PIB_India @CovidIndiaSeva
— ICMR (@ICMRDELHI) June 9, 2020
Also read: In Comparative Study of Antibody Sensitivity, ‘Kavach’ Kit Scores Last
The Ahmedabad Municipal Corporation wrapped up its own seroprevalence survey on July 11. It said in a press release that contrary to the 49% seropositivity rate the ICMR seroprevalence survey had found in the city’s containment zones, its exercise had found that the average seropositivity was only 17.61%.
On July 21, however, the Economic Times cited unnamed sources to state that the ICMR survey in containment zones had found that Ahmedabad had the highest seropositivity – of 55% – followed by Mumbai at 36% and Kolkata at around 30%. According to the newspaper, Delhi had a seroprevalence of 10-15%.
Now, after the paper was published on September 10, The Telegraph reported ten days later – based on information shared by some unnamed authors of the paper – that they had been prevented from publishing data pertaining to the containment zones by the ICMR director-general Bhargava, who is also chairperson of the editorial board of the journal that published the paper.
Indeed, Bhargava is also one of the authors, and had reportedly told his co-authors that he didn’t have the requisite approvals to publicise data corresponding to the survey’s second component. So the paper would have to be published without the data on containment zones or not at all. The former appears to have been the eventual outcome.
The Telegraph also wrote that the 10 ‘hotspot cities’ included in the survey, apart from the 70 districts, were Ahmedabad, Bhopal, Kolkata, Delhi, Hyderabad, Indore, Jaipur, Mumbai, Pune and Surat. The source of this number – 10 – is unclear.
Also according to the report, the survey had found 48% seropositivity in Ahmedabad, 36% in Mumbai’s Dharavi and 30% in Kolkata. If these figures are true, although Bhargava has denied them, they cast the published paper in poor light. (Bhargava’s denial was in the form of a tweet, but as on September 22, all tweets on his account @ProfBhargava had been removed until January 31, 2020.)
The published paper, which doesn’t contain the ‘hotspot cities’ data – only says the following: “We may … underestimate prevalence if our selection missed clusters with higher prevalence including those among most of the metropolitan cities. Only Chennai and Bengaluru were included in the serosurvey on account of the random selection process.”
So which are the ‘hotspot cities’ whose containment zones were surveyed? The answer to this question may also resolve the ’83 districts’ mystery, in that this number could be 13 – i.e. 83 minus 70. However, will these cities’ data be shared or will it remain under wraps forever? These elements form question no. 5.
Question 6 – The missing age-group
Then there is an issue with the sampling. The seroprevalence survey didn’t include people younger than 18 years. We know that people of this age group account for about 10% of all confirmed COVID-19 cases in the country. A comprehensive survey by a group of researchers in Tamil Nadu and Andhra Pradesh led by Ramanan Laxminarayanan corroborates this fraction.
An important detail here is that the survey found no differential risk across ages (including children) of acquiring or transmitting a SARS-CoV-2 infection in both states. This characteristic would perhaps hold true for people in all states.
Thus, ICMR’s September 10 paper might be significantly underestimating the prevalence of infection in the districts it surveyed. But what is the extent of underestimation? We don’t know. This is question no. 6.
Also read: Itolizumab for COVID-19: Who Benefits When the Drug Regulator Is Opaque?
Strangely, the paper doesn’t discuss this shortcoming at all – even if it does acknowledge that by selecting only one adult from each household, the prevalence could have been underestimated “as transmission would be expected to be higher within the household”.
By extrapolating the stratified seropositivity rate found in the survey in three age groups – 18-45, 46-60 and 60+ – to the national population (projected from the 2011 Census), the paper estimates the total population infected in May to have been about 6.5 million. But given the different sources of underestimation, this number is likely to have been higher.
R. Ramachandran is a science writer.