Data, Not a Press Meet, Is a More Reliable Informant of South Asian Ancestry

18/09/2019

Navneet A. Vasistha and Anirban Mukhopadhyay

In the expanse of scientific enquiry, two questions have stood the test of time. One relates to our existence in this seemingly limitless universe, and the other to the origins and spread of the human race on Earth. In the last 10 days, the Chandrayaan 2 mission and efforts to understand ancient human migrations in South Asia both witnessed significant developments on these counts. However, the similarity also stretched to the manner in which these events were reported in the Indian print and digital media.

While Chandrayaan 2 received generous coverage, with a minute-by-minute reporting, the study on ancient genomes received lesser – but not entirely absent – attention. But in both cases, most journalists reserved their attention on the ultimate results with no regard for the underlying scientific processes and historical context. When the Chandrayaan 2 mission ran into trouble, the media that had until then been belligerent abruptly found itself at sea.

In the case of ancient migrations, while the Indian media had more room and time to prepare informed reports, it chose to represent only one of the two academic manuscripts published on the same day. A quick look at the Economic Times, Times of India, The Telegraph, News18, Hindustan Times and The Week shows all of them covered only the paper whose lead author was Vasant Shinde, published in the journal Cell. The other paper, with Vagheesh Narasimhan as the lead author and published in the journal Science, argued the opposite of Shinde’s hypothesis but found no mention in the publications’ columns.

Human population migrations are difficult to study. Until recently, scientists used linguistic, ethnographic and archaeological evidence to make sense of how human groups had moved throughout history. While the conclusions of these studies have contributed greatly to theories of human movement in ancient times, there have been many cases where their application was limited by the quality and availability of reliable data.

This is where exciting developments in the field of genetics and archaeology come to the fore. The extraction of DNA from skeletons found in ancient burial sites is not exactly new. However, the DNA recovered from skeletal remains is usually in degraded form and doesn’t immediately lend itself to meaningful analysis. In 2015, scientists found that the petrous bone[footnote]A dense bone fragment that protects our ears[/footnote] is an excellent source of DNA, providing up to 100-times as much compared to other bones. In addition, the quality of DNA recovered is often better, even when extracted from skeletons thousands of years old and located in tropical regions, where the prevailing climate doesn’t allow DNA to be preserved very well.

Also read: The Chauvinism in Indian Archaeology is Very Evident: Shereen Ratnagar

In parallel, advancements in DNA sequencing have made it faster and cheaper for scientists to sequence entire genomes instead of relying on information from selective gene regions, as was done in the past. In effect, scientists have been able to better sample DNA collected from skeletal remains.

But for these developments, it still isn’t easy to deduce the relationship between two individuals. We inherit our DNA in equal measure from each of our parents, who in turn equally inherited it from their parents, and so on. Thus, the farther apart two individuals are, the more dissimilar their DNA will be. That said, there are particular regions in our genetic material that change only very slightly over long periods of time. For example, a 25 base-pair deletion[footnote]A base is the smallest unit of the DNA alphabet that makes up our genetic code[/footnote] heightens the risk of heart failure in roughly 4% of Indians. This mutation is believed to have occurred around 33,000 years after the subcontinent was first settled and is present to this day in all major ethnic populations of South Asia, barring the Northeast Indians, Siddis and Onges. It is also absent in 63 samples representing people from different geographic regions outside of the subcontinent.

Such stretches of DNA that tend to remain the same over thousands of years are called haplotypes. The accumulation of specific haplotypes in individuals then gives rise to a haplogroup. These are invaluable when tracing the genealogical ancestry of a population.

Scientists mostly use two kinds of haplogroups for tracing such ancestries, related to the Y-chromosome and mitochondrial DNA. The Y-chromosome is inherited from father to son. Mitochondrial DNA haplogroups are inherited from the mother by both sons and daughters. Using these together provides scientists with a fairly good idea of genetic ancestry. There is a catch, however: on their own, they don’t provide information on the directionality of the inheritance, so how do we determine whether two populations are related?

The archaeological site of Harappa, of the Indus Valley civilisation. Photo: Wikimedia Commons

For this, scientists compare the haplotypes between groups and deduce the degree of relatedness. An individual with higher similarity to a particular group should lie closer on the genealogical tree to this group than an unrelated one.

The paper that Shinde was an author on used remains from 61 skeletal samples found in the Rakhigarhi excavations in Hissar, Haryana[footnote]Rakhigarhi is the largest and oldest Indus Valley civilisation site known, dating back to 2,800-2,300 BC[/footnote]. However, only one of the 61 samples – from a woman – supplied DNA that could be used. When they sequenced this DNA, they found her to be related to Andamanese hunter-gatherers and ancient Iranian farmers. Additionally, she was also related to 11 other ancient individuals whose remains had been found in present-day Turkmenistan and far-eastern Iran. Together with the 11 ancient individuals, her DNA also reflected a mixture of ancient Iranian and south-Indian ancestry and said to belong to ‘Indus Periphery Cline outliers’, a term used to describe individuals who significantly differed from the region in which they were found.

In a separate but related analysis, the Rakhigarhi sample lacked Steppe pastoralist ancestry seen in most present-day South Asians. Despite this, Shinde held a press conference arguing that his study “… has debunked the earlier conclusions about Harappans” – when in fact the study upheld these conclusions – and that Harappans “were Indo-Vedic people” when in fact they were not. In a separate interview, he further stated that “the movement from the Steppe is not large”.

Narasimhan’s work states that the 11 individuals identified as outliers might be the ancestors of present-day South Asians and have variable amount of Iranian farmer-related ancestry. This study, which Shinde also coauthored, sequenced the genomes of 837 ancient individuals in the broad geographical regions of present-day Iran, Turkmenistan, Tajikistan, Afghanistan, Kyrgyzstan, Russia and northern Pakistan. These samples were dated to 12,000-1,200 BC.

This study argues that the South Asian population is derived from a mix between ancient-ancestral South Indians, the Indus periphery cline group with Iranian farmer ancestry and a group with Central Steppe ancestry. The combination of these three gave rise to the ancestral North and South Indian populations (ANI and ASI respectively) with the ANI being closer to the Indus Periphery Cline population, likely around 2,000 BC, at the twilight of the Indus Valley civilisation.

Further, the mixing of ANI and ASI populations at different points – from 1,700 BC onwards – probably gave rise to modern South Asians, who thus have considerable Steppe ancestry. However, in the absence of DNA from individuals belonging to ANI and ASI groups, Narasimhan et al attempted to statistically construct these populations. This is arguably a major limitation of the study.

As a workaround, Narasimhan and his collaborators compared the ancient DNA with modern DNA from 1,789 individuals belonging to 246 ethnographically distinct groups in present-day South Asia, and found that even when the ASI population was artificially considered as having no Steppe ancestry, the data was still similar to that obtained from modern tribal groups in South India. This suggests that samples without Steppe ancestry, such as the one from Rakhigarhi, but with substantial Iranian farmer-related ancestry might be similar to southern Indian tribal groups.

Also read: Scientists Part of Studies Supporting Aryan Migration Endorse Party Line Instead

Perhaps the thorniest part of this study is its assertion that the Steppe ancestry in diverse present-day South Asian groups is derived primarily from males and enriched in present-day individuals belonging to Brahmin or Bhumihar groups. This is a sex-asymmetric population mixture mirrored by African Americans and Columbian Latinos, where 20% and 80% of European ancestry is explained in four-to-one and fifty-to-one ratios, respectively, by the male side.

This Steppe ancestry is derived from a narrow time-window between 2,000 BC and 1,500 BC. In stating this, it reiterates the existence of a certain form of the Aryan migration theory. But this last point also helps understand why Shinde and others rushed to discredit this study. As David Reich pointed out in his 2018 book, Who We Are and How We Got Here, there is no escaping the political ramifications of theories of India’s ancestral populations. This in turn renders the press’s response all the more bizarre. For example, one attempt to break the findings down ended up giving equal credence to both studies and fell prey to false balance.

It was heartening to see one report quoting Narasimhan and Reich saying they didn’t agree with Shinde’s assertions, but for the most part the India media was supine. Indeed, it was telling how many outlets were happy to go along with the more politically favourable parts of the studies, with little to no regard for other scientists’ interjections. If nothing else, this episode demonstrates the need to quote independent scientists in a science story, especially when it is on a delicate and divisive issue. The verbatim reproduction of large parts of the papers, and/or press releases, does not make a good science news report.

Navneet A. Vasistha is a postdoctoral fellow at the University of Copenhagen, Denmark. Anirban Mukhopadhyay is a doctoral student at the Department of Genetics, University of Delhi. The views expressed in this article are personal and do not reflect those of the authors’ employers or universities.