India Isn’t Sequencing Enough SARS-CoV-2 Genomes, and That Puts Us in Danger

08/01/2021

Colorised scanning electron micrograph of a VERO E6 cell (grey) exhibiting signs of death after infection with SARS-CoV-2 virus particles (yellow), which were isolated from a patient sample. Image: NIAID/Flickr, CC BY 2.0.

There has been a flurry of activity across labs in India involved in sequencing genomes of the novel coronavirus. Scientists have been checking if the samples that test positive for the virus contain the new variant, dubbed B.1.1.7.

These samples are in the form of swabs that contain both virus particles and cells of the human from which each sample originated. Scientists break open the virus’s cells, access their genetic material – in the form of ribonucleic acid (RNA) – and amplify its presence. Then they analyse its composition.

The RNA is made of four molecules: adenine (A), uracil (U), guanine (G) and cytosine (C). The RNA of the novel coronavirus contains almost 30,000 of these molecules. Each molecule has distinct physical and chemical properties, so by looking for them, scientists can determine the sequence of these molecules in the RNA. They then compare the final sequence with that of the Wuhan strain from December 2019, considered to be the ‘original’.

Over the last year, laboratories across the world have found various strains of the novel coronavirus that differ from the original strain in one or more molecules of their RNA.

The RNA holds information about the structure and nature of proteins that the virus will use to perform its life-functions. So as the number of differences between viral strains increases, it’s likely that the resulting viruses will also behave differently – in terms of how well it spreads and the kind of disease it causes. The differences can be beneficial, harmful or inconsequential to us, and scientists can monitor and predict them.

Altogether, this is the genome sequencing that many laboratories around the world have been conducting since the pandemic began.

The UK has been leading the world’s sequencing efforts. According to GISAID, the open genome repository for viruses that cause influenza and COVID-19, the UK has submitted more than 50% of all genome sequences of SARS-CoV-2; in absolute numbers, that’s 147,776 (as on January 7, 2021). This suggests UK scientists have sequenced roughly 5% of the positive samples in their country.

These fervent sequencing efforts uncovered the virus’s new B.1.1.7 variant, since colloquially called “the UK variant”. Scientists have reason to think B.1.1.7 is 70% more contagious than other variants, and is to be found in 60% of the COVID-19 cases in the UK.

Also read: What We Know About the ‘UK Variant’

Though the strain doesn’t cause more severe COVID-19, it can infect more people than the original strain (in the same circumstances) – which means more people become at risk of falling severely ill. This is worrying; unless the variant is contained quickly, its spread could once again threaten to overwhelm the country’s hospitals.

A question automatically arises: Could India have moved a step ahead of this variant?

India started sequencing genomes of the novel coronavirus at the outset of COVID-19 crisis in India, from around April 2020. In June, scientists reported that 41% of the samples sequenced in India showed the presence of viruses belonging to clade A3i. A clade is a group of viruses of common ancestry. This was especially true for Telangana and Tamil Nadu.

A bubble plot depicting the change in predominant clades with time in various states. X-axis indicates the date on which the sample was collected, and color indicates the clade. Only those states with collection data across at least two months are plotted. Plot: https://doi.org/10.1093/ofid/ofaa434

When scientists compared this strain with those from around the world, using GISAID, they found that the A3i viruses could have originated in the east, in countries like Singapore and the Philippines. The strain was also predicted to be less contagious than the ‘original’.

But over time, the A3i clade of the novel coronaviruses faded away from the population, and were replaced by viral clade A2a. This was the globally dominant clade at the time and it took over India as well. This was a welcome finding because it meant we could follow the same disease management strategies, including detection and transmission models, we had deployed in other places – since the viruses we were fighting were mostly the same everywhere. The only thing we needed to keep an eye on was differences in the human population.

Gradually, India’s enthusiasm in sequencing the coronavirus isolates dropped. The viruses continued to mutate, however, and new variants continued to emerge.

By September 2020, scientists in India had sequenced around 5,000 samples of Indian isolates of the novel coronavirus. This was barely 0.1% of the number of official positive cases in the country. Finances presented a significant roadblock: each sequencing reaction requires reagents worth about Rs 6,000, in addition to sophisticated infrastructure and skilled personnel. The import restrictions that the Indian government subsequently imposed precipitated a reagent-shortage in the country, which in turn affected our pace of genome sequencing.

Together with already waning interest in the enterprise, the rate of sequence production quickly plummeted in India. Around this time, the UK formally reported the rise of the new variant, B.1.1.7. And Indian scientists hardly had the mechanisms to see when and how this variant could have come to India and spread around.

We have a similar issue with the so-called “South African variant”, officially called E484K or 501.V2, which originated in South Africa. Scientists have reported that this variant is more common among younger people – and also more contagious than the prevalent A2a strain.

These are just two of the many variants currently circulating among humans across the planet. We could also encourage the birth of more such variants because the more positive cases in a region, the more viruses will be produced there every minute. And during each of these viral replications, there can be a change in the newly formed RNA sequence, leading to a new variant.

Even as the number of new cases is dropping in India, the country still has had the second largest number of positive cases – and therefore the second largest number of new viruses will have been produced in India. So we can’t afford to ignore the evolution of the virus among the Indian population. And as newer variants take root, we need the mechanisms and resources to monitor them regularly.

We have already overlooked some variants that surfaced in India. One example is N440K, first reported in July 2020 from samples in Andhra Pradesh. Scientists know it to be an immune-escape variant, which means the neutralising antibodies in a person previously infected with the other variants of SARS-CoV-2 are unable to stop an N440K infection. That is, the N440K variant of the novel coronavirus can infect people previously infected with another variant of SARS-CoV-2.

Until recently, scientists also hadn’t studied if and how N440K variants spread into other states. But tests last month revealed the presence of these variants in Karnataka, Maharashtra and Telangana.

We need to understand these changes in the local viral populations to understand the biology of host-viral interactions as well as alter our testing methods, if necessary. All our methods – RT-PCR, antigen tests and antibody tests – depend on the virus’s RNA sequence. Similarly, changing the tests also means changing reagents required, which in turn demands both more money and more time. Different variants can spread with different propensities, so they may need new containment strategies as well.

The Government of India’s newly formed Indian SARS-CoV-2 Genomics Consortium (INSACOG), for centralised genome surveillance of the novel coronavirus in India, is definitely a positive step – if also a bit late.

The consortium brings together 10 labs from around India with a cumulative sequencing infrastructural capacity of more than 25,000 samples a month, which is four times the number of samples we have sequenced thus far. The spread of these labs could also improve proportionate representation of samples from different parts of the country. Third, having such a consortium – coordinated by the Department of Biotechnology – ensures sample collection, sequence deposition and sharing protocols will be the same across these labs. As a result, scientists will be able to compare data from these labs more smoothly.

The ultimate hope is that India’s genome surveillance project will be able to see beyond the novelty of new variants. It’s only a matter of time before we find even newer strains, and some of them may already be present in the population, and demand our attention.

While the absolute number of viral genomes we’ve sequenced might seem competitive with the rest of the world, the fraction of positive cases that have been sequenced is dismal. Unfortunately, it’s the latter that matters.

Somdatta Karak is a science communicator and works with the CSIR-Centre for Cellular and Molecular Biology, Hyderabad.