A sequence of nucleotides from an unknown genome depicted on a wall. Photo: mujitra/Flickr, CC BY 2.0.
Late last week, the National Institute of Virology (NIV), Pune, uploaded the genetic data of coronaviruses obtained from three people in Kerala who had tested positive for SARS-CoV-2 in January, onto an international database. The institute is joined by dozens of similar centres around the world sharing and comparing viral genome data.
Why is this happening?
The answer begins with a simple truism: change is incremental. With this in mind, scientists have meticulously pieced together the genetic evolution of the novel coronavirus, SARS-CoV-2, using that to understand how the virus could have moved from the wild to human populations, and how it is moving around among humans.
We still don’t know when exactly the coronavirus jumped from an animal to a human, or humans, or from which animal. However, there is an effort to unravelling these mysteries at all thanks to what we know about how viruses mutate and the availability of tools that allow us to infer a virus’s evolution through its mutations.
For example, we know the Middle East respiratory syndrome (MERS) coronavirus likely jumped from camels to humans because the difference between the genetic compositions of human MERS and camel MERS is extremely small.
For the camel MERS to have become infectious among humans, one or some of its genes needed to have undergone one or some mutations due to the presence of a selection pressure – an environmental condition that forces the virus to change and adapt.
The MERS virus is likely to have jumped from camels to humans because the relatively small difference between the human MERS and camel MERS genes is the mutation the latter would have undergone.
The SARS-CoV-2 outbreak has lasted approximately three months now, in which time the virus has spread from one province in China to 93 countries and infected 101,927 people. In all this time the virus has been constantly mutating.
Unlike popular culture’s view of mutations, only a few are really dangerous, and that too only in the presence of significant selection pressures. Most mutations simply allow the virus to become a more efficient life-form, not a deadlier one.
So when scientists sample the blood of people infected by the same virus in various parts of the world at different times, they first uncover the different mutations the virus has undergone. Next, they match each version of the virus with travel and contact history of each patient to determine when each mutation might have emerged as well as how more-precisely the human could have been infected.
Such comparative analysis can yield powerful insights. For example, Trevor Bedford, a bioinformatician at the University of Washington, explained on Twitter that genetic data from a virus sampled in late February resembled that from a sample obtained on January 19, both in Washington. This similarity suggested the virus had been circulating undetected in Washington state for over a month, foreshadowing a potential crisis that the American government had missed because it was checking only people who had recently travelled to China.
Finally, scientists use what they know about how often different types of viruses mutate to tease out SARS-CoV-2’s own evolutionary pathway. An intermediary product of these efforts that provides a quick picture of our virus is this tree diagram from nextstrain.org:
So what have scientists learnt?
Bedford, who also cofounded nextstrain, told Science, “One of the biggest takeaway messages [from the viral sequences] is that there was a single introduction into humans and then human-to-human spread.” A disease ecologist named Peter Daszak said in the same report, “If we don’t find the origin, it could still be a raging infection at a farm somewhere, and once this outbreak dies, there could be a continued spillover that’s really hard to stop.”
There are other mysteries as well, each one a potential window into SARS-CoV-2’s origins and fate. Nextstrain’s visualisation including the new NIV data shows how the two samples from Kerala sit far apart from each other on the tree diagram as well as refuse to align closely with any of the other versions circulating in Wuhan in January, which is the time at which the three people were also in the city. (The two data-points are denoted by pale blue dots next to the red stars.)
What this means is not clear yet. India reported eight new cases on March 8, all from Kerala, and the country’s total confirmed case count now stands at 39. The NIV will continue to study the viral samples from different patients. As it uploads more data to the GISAID[footnote]Short for ‘Global Initiative on Sharing All Influenza Data'[/footnote] database (which nextstrain visualises as the epidemiology tree) along with nodal centres around the world, we might get some answers.
Until then, we keep testing.