# How Do Scientists Model the Spread of an Infectious Disease?

The future is a strange thing. We don’t just step up and meet the unpredictable. We try to predict what’s coming first so we can meet it without putting ourselves at too much risk. Mathematical models help us make our mental models more quantitative, especially when a quantitative understanding is useful. For example, mathematical models can help us describe how an infectious disease spreads through a population, indicating what the case load might be in six days or that the ICUs in a state could be overwhelmed by day 20.

Exhibit A: the coronavirus pandemic.

The very act of modelling requires that we ruthlessly eliminate extraneous detail. What emerges might seem laughably simple. Indeed, we must remember that models are only caricatures of reality and shouldn’t be confused with reality itself. Nevertheless, the lessons of several hundred years of studying physics, mathematics and epidemics is that even simple models can provide powerful insights based on which we can make important decisions.

The value of models is that they allow us to explore what could happen. They show how different potential futures might be moulded by what we do now. In different models for infectious disease epidemics, we can examine the effects of specific interventions, such as a quarantine or a lockdown. Each such intervention leads to changes in the number of people infected in time. We can ask which intervention is better than which other intervention, and whether combined interventions might work better than separate ones.

The simplest model for the spread of an infectious disease is the following. Imagine you’re dealing with the population of Chennai 1. For each person, suppose we know the answer to the following questions:

• Have they been infected by the disease? If not, they are ‘susceptible’ to falling ill with it.

(We’ll assume that they have no natural immunity.)

• Do they have the disease currently and are they capable of infecting others? If so, we’ll call them ‘infected’.
• Have they ‘recovered” from the disease, having already got it? In which case, if getting the disease confers lasting immunity, they will not get it again.

Clearly, everyone in the population can be placed into one of three compartments: the ‘susceptible’ (S), the ‘infectious’ (I) and the ‘recovered’ (R). No person can belong in two compartments at the same time. Instead, people can only change compartments, e.g. move from the S compartment to the I compartment if they get infected, or from the I compartment to the R compartment when they recover.

How do we understand this movement between compartments? Let’s think about how susceptible people get infected, thus requiring that they be sent to the I compartment after being taken from the S compartment.

Someone who is susceptible to the disease can only be infected if they come into contact with an infected person. The more infected people there are, the more the chance of a susceptible person being infected. Similarly, what is true for one susceptible person should be true for all of them, since any susceptible person on their own, interacting with any of the infected persons, has the same chance of  becoming infected. This suggests that the numbers of susceptible persons moving to the I compartment across a short period of time should depend both on the numbers of susceptible persons as well as of infected persons at the time.

(It is in fact related to the product of the numbers of infected and susceptible people.)

Next, how do we understand how infected people recover, thus moving from the S compartment to the R compartment? We  recover on our own, independent of whether we are surrounded by infected people, susceptible people or other recovered people. So the recovery rate of any infected patient is independent of S, I and R. And if there are a large number of infected people, a given fraction of them will recover in the same period of time.

The Kermack-McKendrick theory

Two Scottish mathematicians named William Ogilvy Kermack and Anderson Gray McKendrick put this argument into equations, describing what is now called the SIR model for infectious diseases. They tested their model by comparing what it predicted to the number of cases each week in the Bombay plague of 1905-1906. The figure they plotted – illustrating a close agreement between the model and data – is perhaps the most reproduced figure across textbooks in mathematical epidemiology (see below). It speaks to the remarkable success of models in capturing the behaviour of a real-life epidemic.

What Kermack and McKendrick discovered was the central importance of a single quantity called the basic reproductive ratio – R0 – in determining the scale of an epidemic. R0 (pronounced ‘R-zero’) can be thought of as the number of people infected on average by a single infected person who encounters a susceptible population. Someone with measles will typically infect about 18 other people on average. Someone with the flu will infect between 1.5 and 2.5 people on average. R0 for COVID-19 seems to be about 2.3.

The larger the reproductive ratio, the more the number of people who will be infected, and the harder it is to control an epidemic, although other factors also enter into this. Once R0 becomes smaller than 1, the epidemic dies out. Thus, ways of tackling an epidemic (vaccinations, social distancing, quarantining etc.) can all be interpreted in terms of  how they help to reduce the value of R0. We can reduce R0 by vaccinating people since this simply reduces the number of susceptible people who could potentially be infected.

Kermack and McKendrick’s work showed us that you don’t need to vaccinate the whole population to ensure a disease dies out. It is sufficient to vaccinate a reasonable proportion. For COVID-19, assuming a vaccine was available, that number would have been about 60% of people,  a fraction that can be calculated using the value of R0. This is called “herd immunity” since even partial vaccination programs protect the “herd”, including the fraction of people who might escape being vaccinated. This is indeed a surprising, even amazing, result that could have come only from modelling.

The advantage of simple models is that they illustrate the path towards more complex but more realistic models. These models can capture the nature of the disease better or simply describe the process of infection better. One improvement on the SIR model, which is also applicable to the COVID-19 pandemic, is to have an ‘exposed’ or E compartment between the S and I compartments. In the E compartment, we can put those people who might have the virus and are in a position to infect others but who have not begun to show any symptoms themselves. In medical parlance, these people are said to be asymptomatic.

We could have other sub-compartments, one for the seriously ill and requiring hospitalisation and another for those who can easily recover on their own. We could also have sub-divisions within each compartment, depending on age, so that we can account for the fact that elderly people are more likely to have a more severe form of the disease as opposed to younger patients.

For now, let us focus more on the SIR model and what it assumes. The simplest SIR model assumes that any infected person can infect any susceptible person. But in most cases, this assumption is plainly false. It is far more likely that I am infected by someone who is in close proximity to me, perhaps shares my living or working space or who travels with me on public transport.

We can improve the SIR model by accounting for this network of relationships defined by physical proximity. Often social proximity – the same community, the same family, the same mosque, church or temple – implies  physical proximity, so one set of networks can be used to infer another. Such network models can be used to infer properties of the disease that are not otherwise obvious, such as the disproportionate importance of well-connected people (in a network sense) to transmitting disease. For example, a doctor might simply be more likely to encounter more infected individuals in the day than you or me.

A final improvement is to recognise that every individual is different and that small differences may be amplified when considering how diseases spread. Two people chosen at random might live in different homes, work in different places and encounter different sets of people in the course of a day. In addition, their immunity might vary, their access to healthcare could be different and the hospitals they might be closest to if they felt ill would most likely not be the same.

The required generalisation of disease models superimposes many more attributes over the simple compartments S, I and R. While my ability to infect others still depends on my being classified as ‘infectious’ or ‘exposed’, whether I eventually infect them depends on whether I live alone, hardly meet people and shop only briefly for essentials once a week. Or, at the other extreme, whether I am a popular DJ, out every night, rocking the show and meeting my fans.

We then can begin to think of the individual as the basic unit of the model. Of the attributes of any individual, only those relevant to their ability to transmit disease or become infected are important. We can think of these model individuals  as ‘agents’, to use the conventional term. Agents have home and work locations, travel along designated trajectories over a 12-hour cycle, have ages and families assigned to them, and inhabit a geographical background that idealises a real city, with its housing, its shopping and workplaces.

We can then put our idealised model of ‘agents’ on a computer and simulate the motion and interaction of agents over days and weeks. We can introduce one or more infected agents and track how the infection spreads, how hospital beds might fill up and how requirements for ICUs and medicines change with time. And because computers are powerful, nothing prevents us from simulating, say, a mid-sized city like Chennai or Bengaluru with about 7 million people.

We’ve come a long way from the SIR model today, but at a price. By thinking of individual people in terms of our simplified ‘agents’, and simulating how they might move and interact, we introduce realism but also add complexity. These computer programs are large, specifying the details of each agent takes time, and running the program itself uses large amounts of computing power. Nevertheless, such models are state-of-the art as models go and are used around the world.

There are other approaches, too. Some emphasise the fact that not all contacts between susceptible and infected people lead to an infection. They account for the fact that sometimes, just by chance, an infection can die out on its own. Network models can carry their own complexity, since the variety of networks that represent the interactions of people is diverse. Some models are purely statistical models that extract patterns from data on epidemics. Other models use machine learning, where advanced algorithms are trained on real-life examples from the past and then predict what might happen in novel situations.

Modelling never stops. As we learn more about how a disease spreads, we will also improve our models – or even abandon them if they fail. The responsibility of the modeller is to be transparent about the assumptions that the model uses, to be honest about its limitations and yet to take pride in their ability to distil complex reality into a set of simple assumptions that help us better understand – however imperfectly – the world around us.

Gautam I. Menon is a professor at Ashoka University, Sonepat and, at the Institute of Mathematical Sciences, Chennai. The views expressed here are his own.

1. One of my home cities

Scroll To Top