China: SARS-2 Gene Sequences That Had Vanished from Database Reappear

A colourised scanning electron micrograph of a cell (purple) infected with SARS-CoV-2 particles (green), isolated from a patient sample. Photo: NIAID/Flickr, CC BY 2.0

New Delhi: More than 200 samples of genetic sequences isolated from early COVID-19 patients in China, and which had been removed from an online database in China, have resurfaced after a year, according to the New York Times.

Jesse Bloom, a virologist at the Fred Hutchinson Cancer Center in Seattle, had first noticed that the samples were missing and managed to track down 13 of the sequences. In a report posted online in June this year, he said it “seems likely that the sequences were deleted to obscure their existence”.

However, the NYT reported that “an odd explanation has emerged” for their removal, stemming from an editorial oversight by a scientific journal. The sequences have now been uploaded to a different database that is overseen by the Chinese government.

The original findings, about an ‘accurate and comprehensive’ way to detect SARS-COV-2, were posted online in March 2020, after researchers at Wuhan University sequenced a “short stretch of genetic material” of the virus taken from 34 patients at a hospital in the city. The first known cases of COVID-19 had been identified in Wuhan in late 2019.

Later that month, March, the sequences were also uploaded to an online database called the Sequence Read Archive, which is maintained by the US National Institutes of Health (NIH). The researchers “submitted a paper describing their results” to a scientific journal called Small, according to the New York Times. The journal published the paper in June 2020.

The newspaper said that Bloom became aware of the Wuhan sequences this year, when researching the origin of COVID-19. He spotted them in the Sequence Read Archive through a May 2020 review about early genetic sequences of coronaviruses.

However, he could not find them and emailed the Chinese scientists on June 6 about where the data went. He did not get a response. A couple of weeks later, he posted his report.

When the report was picked up by newspapers, the NIH said that the authors had requested the sequences be withdrawn in June 2020 because they “were being updated and would be added to a different database”.

The sequences were “quietly uploaded” to a database maintained by the China National Center for Bioinformation by one of the co-authors of the Small paper on July 5, NYT said.

When the disappearance of the sequences was brought up during a press conference in Beijing on July 21, Chinese officials “rejected claims that the pandemic started as a lab leak”, the newspaper said. A translation of the press conference provided by the state-controlled Xinhua News Agency said that the vice minister of China’s National Health Commission, Zeng Yixin, gave an explanation for why the sequences disappeared.

After editors at Small “deleted a paragraph” in which the scientists described the sequences in the Sequence Read Archive, they researchers thought “it was no longer necessary to store the data” in NIH database.

While an editor at Small confirmed to NYT that the data availability statement was “mistakenly deleted”, it still does not explain why the researchers told NIH that the sequences were being updated and would be uploaded to another database.

Small published a formal correction on July 29, admitting that the sentence was erroneously deleted and providing a link to the newly uploaded database.

NYT said it is not clear why the researchers waited a year to upload the sequences to another database.

Bloom speculated that it could be worthwhile to look for other sequences of coronaviruses that might be “lurking online”.

NYT also noted that in the initial reports, the Wuhan researchers had written that the genetic material had been extracted from “samples from outpatients with suspected COVID-19 early in the epidemic.” However, the entries in the Chinese database say they were taken from Renmin Hospital of Wuhan University on January 30 – two months after the earliest reports of the viral disease.

