The news that 80% of people with COVID-19 in India are asymptomatic is simultaneously hopeful and frightening. Hopeful because it means that a larger fraction of people will possibly not fall sick due to the new coronavirus or require hospitalisation, as is the case in many European countries. But caution is needed here: there is a tendency to overextend the argument that Indians have some kind of resilience to claim that the poor, with their higher rates of exposure to other infectious diseases, will be even more resilient to the virus, an assertion that even the poor have echoed in a desperate bid to get back to earning a living.
However, getting into muck and dirt is not a salubrious act but a harsh reflection of hazardous working conditions, and toxic chemicals do nothing to bolster your immunity. Combined with malnutrition and unhealthy living conditions, they are at severe risk, as is clear from the data from New York city, where the morbidity among the homeless is much higher compared to the general population.
The frightening part of the news is that we are unwittingly in close contact with people with COVID-19. The Indian Council of Medical Research has belatedly admitted that we do have community transmission of the disease. This adds a new urgency to the need for large-scale testing. This can only be done if we leverage the skills and infrastructure of the larger scientific community. But there is always resistance to new ideas, and this resistance is highest among experts who do not like to be pulled out of their comfort zones. A part of the resistance is however actually justified. Experts know about the complexity of the process and are naturally dismissive of what they perceive as simplistic solutions advanced by non-experts. But a part of it is also psychological – a resentment against unknown and possibly ignorant persons muscling in on their domain of expertise.
So let us discuss in some detail the whole process of large-scale testing, the difficulties associated with transforming this into a participatory programme involving the larger scientific community, and ways to overcome them. As a preamble, it is important to acknowledge the tremendous gains of this exercise. South Korea and Germany, for just two examples, are countries that have successfully contained the spread of the new coronavirus with far fewer deaths primarily with a high rate of testing and a rational bottom-up approach, where individuals were quarantined rather than entire communities. Large-scale testing can also identify asymptomatic individuals, who should be treated not as liabilities but as resources. If their immune system is producing antibodies to fight this virus, they could donate their blood and the blood plasma could be a source of antibodies for critically ill patients whose own immune system is too compromised to make antibodies. This in turn could help reduce the load on hospitals, which might otherwise get overwhelmed if the patient numbers increase further.
During the Spanish flu pandemic of 1918, which ended up killing around 50 million people worldwide, we did not know of the existence of viruses. They were discovered in the 1930s. What we did have was a vague idea about how the disease was transmitted and those cities which shut down did indeed have a lower spread of the disease. Our lockdown models itself around this strategy, and declines to use the accumulated wisdom of scientific discovery spanning the last 100 years.
The major steps of large-scale testing
The first step is sample collection. It has to be increased to cover at least one person from every household. Mobile collection centres which visit every locality is required staffed with people who have PPE. Maybe mass testing will remove the stigma associated with such tests and people will come forward more readily.
One hurdle which is often pointed out is that we need labs of biosafety level (BSL) 2 or higher to test samples which potentially contain the live virus, and also personal protective equipment for all staff involved in testing. My personal opinion is that the number of volunteers would indeed drop very sharply if they are expected to handle live virus samples. If the samples are put by the collection staff into a solution of guanidine thiocyanate, a powerful protein denaturant, along with phenol and beta mercaptoethanol, this would inactivate the virus while preserving its RNA. Then such safe samples can be handled much more easily in a university or research institute setting.
The only drawback is that it would require that the time gap between sample collection and testing is small because RNA tends to degrade faster. This limits this procedure initially to the big cities, which have universities and research institutes with modern biology labs close to the collection points.
In the real world, no testing procedure is perfect. They always have errors often called noise. They generate false positives, which means the test has generated a positive result for a sample which is actually negative. And false negatives, which means that a positive sample has avoided detection and the test result is negative.
Let us deal with the false positives first, which is also called the specificity of the test. We use a technique called polymerase chain reaction (PCR), which involves first isolating the RNA from the test samples, converting the viral RNA into DNA using an enzyme and then putting it in a reaction mix inside the PCR machine. This mix contains two short DNA fragments called primers, which bind to specific sections of the viral DNA. The reaction runs through multiple cycles and each cycle doubles the amount of target DNA (which is the region of viral DNA between the two primers).
Of course, once sufficient DNA has been made, the amplification process slows down because the necessary chemicals required to make more quantities of this DNA start getting exhausted. Generally, 25 cycles are more than enough because even if the starting amount of viral DNA is exceptionally small, 25 doublings would produce a huge amount. So typically after 15-20 cycles, enough DNA has been made and further doublings don’t take place. But we usually play safe and run the reaction for a few more cycles.
Typically PCR tests are fairly specific, though conventional PCR (which I am suggesting we use for testing in order to cut down costs) is an end point reaction and thus has a slightly lower specificity compared to real-time PCR, which is the gold standard. Digital PCR has been shown to be even better but that’s not our issue.
With large-scale testing, we come up with an interesting difficulty called the base-rate problem. Let us say that my machine has a 1% false positive rate, which looks at first sight like a pretty low level of error. Now, for the purpose of illustration, let us assume we have a base rate of 1 COVID-19 positive patient in 1,000 persons (that means 1.3 million people could be infected in India, so let us hope it is not true!). When we conduct 1,000 tests, the number of false positives will be 10 (which is 1% of 1,000) – whereas the real number of positive will be 1. So the machine over-reports the number of positive cases by 10x. Not a very nice situation since it would create panic among the people.
This problem is only faced with mass-testing, not when you test a small target group of patients who are likely to be infected or show signs of infection because then the base rate is not so low. To resolve this, we need much higher specificity, which is achieved by running your samples with two or more different sets of primers. The US Centres for Disease Control (CDC) website provide three such unique sets. The fact that each primer set will bind to this unique but different sections of the viral sequence means you will get three separate amplifications of different segments of the viral RNA in three different reactions, making the test far more specific.
But this of course triples the number of experiments you need to do. One way to avoid this is to do the large-scale testing with one primer set and identify all the positive samples, which are then again tested with all three primer sets to rule out false positives. Such primer sequences are available on the CDC website and can therefore be synthesised in labs. But the CDC has a caveat: that the primer sequence may change as the virus evolves, so India should continue with the viral sequencing work so that we design primers based on the specific virus strain that is present and also evolving in our country.
Now for the false negatives. This is a two part problem. First, the false negative can emerge because of poor sample collection. This is tested in the lab, where we locate the presence of a compound called human RNAse P (again by PCR) in the sputum sample as proof of proper collection. But if samples are pooled and some contain RNAse P and some don’t, then the pooled sample would be positive even if the sample collection is poor. Therefore before pooling, samples have to be tested for RNAse P individually, which rather defeats the advantages of pooling.
This is a typical problem of statistical quality control, where the standard solution is to take a small number of products randomly chosen from a large manufactured batch and then tested individually. If none of these products is found to be defective (in this small sample), we can declare with confidence that the whole batch is defects-free. So if we have 5,000 samples and we use a mild criterion (which says that we want to be 95% confident that 95% samples are of good quality), then we need to test at least 38 random samples individually, and each and every one of them should contain RNAse P. Only then can we declare that the sample collection was proper. If we use a more stringent criterion, the number of samples that need to be tested individually will of course be much higher.
Now for the PCR itself. If the procedure is not sensitive enough, it will generate false negatives – especially when we pool together and hence dilute the positive sample with a large number of negative samples. Fortunately, these tests have been done before and it has been shown that even if we have one positive in a pool of 64 samples, the result is positive. Research labs increase the sensitivity with a range of protocols, and there is no significant difference in sensitivity between real-time PCR and conventional PCR. So really, we do not have problems in this regard, though any research lab will validate its sensitivity by using various dilution of a positive sample and showing that PCR can locate the presence of a positive sample even after these dilutions.
Finally, we come to the pooling strategy. Pooling was first started with British soldiers to test for syphilis, and techniques have evolved considerably since then. I had previously suggested the simplest of these methods which involves arranging samples in a square grid and pooling the rows and also the columns. Thus, each sample is pooled twice and belongs to one row pool and one column pool.
Now, if one row and one column show up as positive, the intersection point of this row and column identifies the positive sample. To illustrate, consider a 48 × 48 grid. (Biologists tend to favour eight and its multiples not because it is a magic number but for experimental convenience.) The commonly used multipipettor has eight nozzles and so picks eight samples simultaneously. The standard 96-well plate used to process samples consists of 12 wells in length and eight wells in width. Arranging four such plates lengthwise and six side-to-side provides a nice 48 × 48 grid of samples. Then each row can be pooled to give 48 pooled row samples and similarly each column can be pooled to give 48 pooled column samples, making a total of 96 pooled samples.
Since the PCR machine typically runs a 96-well plate, we can run all these samples at one go. Note that we have managed to test 2,304 samples in a single run which would typically require a few hours. Of course, we need to run these samples in a gel to look for DNA amplification but still the gains are enormous.
Unfortunately, things are never that simple. Consider a situation when we have five positive samples. Then we get five positive rows and five positive columns that intersect each other 25 times. The test thus identifies 25 potential positives of which only five are true positives. If the number of positives is N, the pooling procedure identifies N^2 potential positives, which becomes a problem if we have a large number of positive cases. To resolve this, much more sophisticated pooling techniques have been designed but they are beyond the scope of this article.
(Pooling techniques are categorised into adaptive and non-adaptive methods, each with their own advantages and disadvantages. A very versatile non-adaptive pooling method called shifted transversal design is used routinely for high-throughput screening. It is completely incomprehensible to me; if you still want to go ahead and read, my advice would be to skip judiciously.)
The take home message is that pooling strategies can be tailored to suit the specific situation at hand and can be made both efficient and robust. For this, biologists need to talk to mathematicians, something which they have started doing recently. Some barriers have started to go down between disciplines and it is at such interfaces that the best discoveries are often made.
The purpose of this article is not to delegitimise the role of experts. There are a hundred small but critical factors that make a testing procedure truly robust, and it requires a specialist to know them all. But we also need to engage with experts if only to request them to get out of their ivory towers and respond meaningfully to pressing social needs. Knowledge empowers a person to ask the right questions. We seem to be reasonably comfortable with questioning politicians because we have a vibrant tradition of political democracy, though it is getting a bit shaky nowadays. But expert knowledge seems to escape this democratisation process.
It’s reminiscent of our older Brahmanical traditions, where knowledge was the preserve of a few. We tend to treat technology, whether it is a product or a process, like a black box: inflexible, inviolate and beyond comprehension. But we need to understand that the process of technology construction is not only opaque but the optimality principles used in its development often flow from considerations other than social needs. To question this, to intervene and indeed to modify, we first need to comprehend. This demystification is a necessary prelude for a proper dialogue. People have to understand and question so that they can ensure that their social concerns are met, and not be told that technology is an infrangible law of nature.
Only this will lead to a better acceptance – because it is based on a rational understanding of what can and cannot be done. There have been wonderful efforts in the past, such as the Bharat Jan Vigyan Jatha, to popularise and simplify science so more people can access it. These efforts are now paying dividends; today, one of the states where such peoples’ science movements were the strongest – Kerela – has also very efficiently handled the COVID-19 epidemic. The fight against COVID-19 has to be fought on multiple fronts. Empowering people, inculcating rationality and democratising knowledge are just as critical as hospitals and medicines.
K.J. Mukherjee is a retired professor and former dean of the school of biotechnology, Jawaharlal Nehru University, New Delhi.