Photo: fernando zhiminaicela/pixabay.
Imagine a country of 100 people where, according to some preliminary data, researchers have estimated that 20% of the population has COVID-19. Also suppose the country uses test kits that are 90% accurate to detect infected persons.
A random citizen of this country is tested using the kit and is found to be ‘positive’. Now, what are the chances that the result is a true positive, i.e. that the person is actually positive?
If your guess is 90%, which is quite intuitive, you are wrong. The actual answer is 69.23% percent. Obviously, it goes against common sense but that’s how reliable these test kits are.
The key to understanding this calculation begins with the 90% accuracy figure. It does mean that the kit will correctly identify an infected person 90% of the time – together with the fact that it will also be wrong 10% of the time, producing a result called a false positive.
In our hypothetical country, there are estimated to be 80 healthy persons and 20 infected persons. The kit has a 90% accuracy, which means only 90% of the infected people (18) will be detected positive, i.e. the results will be true-positives. The kit also has a 10% inaccuracy, which means 10% of the healthy persons (8) will test positive as well, i.e. the results will be false-positives.
Ergo, the total number of cases that are detected positive by the test kits is the sum of the true-positives and the false-positives: 18 + 8 = 26. Now, out of this group of people, what are the chances that you can pick someone who is actually infected – i.e. the fraction of true-positives? It’s 18/26 = 0.6923 = 69.23%.
This way, the use of the test provides some information about how much of the population could really be infected, but it is not conclusive.
Say we were to test only the people who tested positive a second time. Will the certainty of the results improve?
The total number of people to be tested in this round is 26: 18 persons are true-positives from the infected group and 8 persons as false-positives from the healthy group. The test kit is still 90% accurate, which means the number of people tested positive from the infected group is 16.2 (90% of true-positives) and the number of people tested positive from the healthy group is 0.8 (10% of false-positives). So the total number of people who tested positive after testing is 16.2 + 0.8 = 17.
Now, the chances that a person randomly picked from this set actually being infected (i.e. a true-positive) is 16.2/17 = 0.952 = 95.2%.
Let’s progress to the third round. If the tests are repeated again for the 17 people who tested positive, the chance that a person randomly picked from this group is actually infected jumps to 99.45%.
So yes, the certainty of the results improves after repeat tests of the same group of people.
What of the people who the tests said do not have the infection, i.e. who tested negative? Are they truly not infected? The test kits are inaccurate 10% of the time, which means out of the infected population of 20 people (20% of the population), two have tested negative. The kit’s 90% accuracy implies it will test 72 persons as negative out of the group of 80 healthy persons. The number of people who test negative will be the sum of false-negatives and true-negatives: 2 + 72 = 74.
The chance that a person randomly selected from this group of negative persons is actually healthy is 72/74 = 0.9729 = 97.29%. That is, the chance of a person actually being healthy jumps from 80% to 97%!
In this case, we won’t repeat the tests in one or two more rounds because such repetition will only improve one’s chances of not being infected by a small margin. For example, the probability jumps to 99.69% in the second round and to 99.96% in the third. However, it will never cross 100%.
The data presented in this exercise was hypothetical but the methods used to zero in on the truth have been used to crack encryption codes in the Second World War and track submarines in the high seas. And they are used widely today to predict whether an individual will develop a certain disease or pass one on to their children.
Further, in the current situation, the antibody – or serological – test is being used to detect antibodies against the novel coronavirus with no cross-reactivity to other coronaviruses, which could generate false-positives. Indeed, false-positives are particularly important when we don’t know how many people in a given population have had COVID-19.
Specificity, also called the true-negative rate, measures the fraction of actual ‘negatives’ that are correctly identified as such. Sensitivity, also called true-positive rate, measures the fraction of actual positives that are correctly identified as such.
Aamir Khanday is a research scholar at NIT Srinagar. Suhail Ahmad has an undergraduate degree in electrical engineering.