Y chromosome profiling, important in sexual assault cases, can often be presented incorrectly in court. New math could help by taking the ambiguity out of the equation.
In 1984, a geneticist named Alec Jeffreys developed a technique for profiling DNA. The find was serendipitous and radically changed the field of forensics. In 1986, DNA profiling was used to absolve an innocent suspect of two murders and, only a year later, confirm the identity of the real killer. Since then, over the last three decades, DNA evidence has been used in and out of court to help resolve immigration disputes, paternity tests and cases of sexual assault. While newer techniques may have replaced Jeffreys’s, the fundamental principles have been unchanged.
A DNA profile is a person’s DNA fingerprint, considered to be unique to each individual. While similarities can surface between two relatives’ DNA profiles, two unrelated strangers are unlikely to have much in the way of common DNA.
A complete DNA, present in every cell of the body, includes 23 pairs of chromosomes; one of them is a pair of sex chromosomes. The sex chromosomes are denoted as XX for females and XY for males. Chromosomes are long strings of genetic material in the form of three billion pairs of compounds called nucleotides: adenine (A), thymine (T), cytosine (C) and guanine (G).
Within this sequence of base pairs lie certain combinations of A, T, G and C that have a tendency to repeat with a pattern that is thought to be unique to each individual (except in the case of twins). DNA profiling is based on identifying these patterns.
While India’s history of studying and implementing DNA profiling is almost as long as the international one, it is significantly less advanced. With the recent Human DNA Profiling Bill, India became one of the last major nations to propose the setting up of a countrywide DNA database.
A DNA sample obtained from the crime scene and presented in court requires not just the determination of a match but also the likelihood of finding other matches, called a match probability. This is why having a database containing random samples is important: it helps provide a picture of how often certain sub-patterns in individual DNA occur in the population, if they’re truly rare, etc. Such databases are seldom well curated, and so fail to represent matches that may exist among the population. Such under-representation of DNA profiles is unfair and can mean the difference between life and death for a suspect.
DNA evidence mostly depends on profiling autosomal – i.e. the non-sex – chromosomes. During sexual reproduction, these chromosomes are shuffled around, creating the diversity we see in the different faces around us. Such variations in autosomal chromosome profiles increases the confidence we can place on it as evidence.
However, when it comes to the Y chromosome, it’s a whole other story. “Fundamentally, the Y chromosome is like a bacterium,” Charles H. Brenner, the writer of DNAView, a DNA profiling software used globally, told The Wire. “It reproduces by cloning, the same method that was popular three billion years ago.”
As it stands, Y-chromosome profiling is only a small part of DNA evidence. But it is more important when applied to cases of sexual assault, where it is likely that male and female DNA at a crime scene will mix. However, this method is not entirely foolproof for many reasons – and which new research from Aalborg University, Denmark, and the University of Melbourne, Australia, seeks to tackle.
The researchers have developed a workaround that eliminates uncertainties due to using statistics based on incomplete databases and provides a guide to presenting the math in court in a less biased fashion.
“With this new method, we can help the court by narrowing down the number of possible perpetrators from perhaps several hundred thousand to under a hundred – and often even far fewer,” explained Mikkel Meyer Andersen, an associate professor at Aalborg University, in a press release.
Y-chromosome profiling is handy because women don’t possess the Y chromosome. This means that in cases of mixed DNA, identifying genetic material from a man becomes easier. But there’s a tradeoff: Y-chromosome profiling is more likely to be incorrectly presented due to database errors than autosomal profiling.
“The Y chromosome is handed down intact from father to son,” David Balding, a professor at the University of Melbourne, Australia, and a member of the research team, told The Wire.
The essential problem with this is that a suspect will share his Y profile with his father, grandfather, great-grandfather and so on, going back many generations. This shared Y profile may even extend to more distant relatives, who may also share aspects of their physical appearance or geographical location.
With more advanced and discriminating profiling systems that calculate match probabilities, this number has become smaller over the years. But as they are still reliant on faulty databases, a match may not be rare as much as simply underrepresented.
These problems with the database are unlikely to go away anytime soon. “You can’t ask people to give your DNA profile to the police to be put on a public database. Nobody wants to do that,” said Balding. In light of some of the reservations surrounding the DNA Profiling Bill in India, this is easy to comprehend. “With our method you don’t need to rely on a database,” he continued.
Once investigators get a hold of genetic material from a crime scene, it is processed to generate a DNA profile. The next step is either matching the sample profile to that of DNA taken from a suspect or screening a database to find a match. To say with confidence that the match is rare and thus unlikely to belong to anyone besides the suspect, researchers need to answer the following question, according to Balding: “What’s the probability that a random individual in the population would match?”
The answer lies in determining how many men would have the same Y profile. There is no blanket method of evaluating how much confidence can be placed in this match probability. Andersen’s and Balding’s simulation relies on information extracted from the mutation rates of the genetic markers of the Y profile and the population history.
They used an R package – a statistical computing and programming language – called malan (MAle Lineage ANalysis). This computer model takes into account the number of matching men that are likely for various scenarios. They make assumptions about the population size, its growth rate, variations in reproductive success rates for different men and the frequency of mutations for specific genes on the Y chromosome to define this number.
Variations in the reproductive success rates of men depends on several factors, including competitions for mates – especially in case of polygyny. Parameters are set to investigate the degree to which all these factors influence the number of matching Y profiles to be found. Using a statistical method called the symmetric Dirichlet distribution, the researchers assigned probabilities to each male in the new generation for whom a father is chosen independently and at random from the previous generations.
“What I think of as the most fundamental discovery is the idea that pretty much independent of the population, the number of matching men will be the same,” says Brenner, who was not involved in this study. Surprisingly, with this new simulation, the effect of database information was also negligible – a promising result when most models being used rely on poorly assembled databases.
Their simulations took into consideration population sizes of 100,000 and found that moving to a 1,000,000 population size made no difference to the match numbers. As all the matching Y profiles are from relatives only as far removed as 10 or 20 father-son steps, the match number is likely to be the same independent of the population size. They also discovered that the number of sons a man has affects this number minimally, except in cases of rapid population growth or polygyny.
Another surprising aspect of Andersen’s and Balding’s work is they keep away from calculating match probabilities, unlike traditional mathematics used to assess DNA evidence. “What’s a little bit strange about their approach is that they don’t talk about probabilities and likelihood ratios,” says Brenner. In the paper, they talk about numbers as opposed to percentages or ratios. “They sort of leave it to the jury to have an intuition for what that means in terms of the chance of a random person, an innocent person matching by chance.”
The way things work right now, lawyers usually say something like this: There’s a one in a million chance of finding someone else to match with the suspect. With this simulation, the court could grasp the math more easily. One could simply say: There are 50 men out there who share a Y profile with this man in the dock. In this instance, it would be prudent to emphasise that the men who share this DNA are close and distant relatives who may even share physical characteristics. Given all of this, does the court have adequate evidence to pen the suspect in?
Evidently, both ideas – probabilities and numbers – paint entirely different pictures. Andersen’s and Balding’s method can help the court take a more wholesome look at the numbers by keeping problems with databases and the misleading presentation of probabilities out of the equation.
Despite his reservations about this deviation from convention, Brenner acknowledges that Andersen and Balding are trying to “bridge the gap” between the math and its communication to a judge or jury. “There’s plenty of evidence that you can’t communicate numbers to a jury – not probabilities,” according to Brenner. He admits the presentation of evidence in court is often twisted out of context. “The understanding of Y chromosome evidence is shamefully bad. The prosecution does a horrible and confused job of trying to explain it,” said Brenner, an expert in court cases.
While language can be ambiguous, math and DNA evidence are straightforward. “You want the evidence to be fairly evaluated in court and I think up until now, the evidence hasn’t been fairly evaluated,” Balding said. He believes Y profile evidence has been overstated in the past, but in the future a more faithful standard may be applied.