Learning more about R (a.k.a. Reproduction number) and COVID-19

R is a number used as a simplified indication of how quickly COVID-19 (or any other infectious disease) is spreading and how critical it is to take additional measures to control the spread of the disease.

R values per country as of the 17th of December 2020 (source: http://metrics.covid19-analysis.org/)

But what exactly does R represent, what are the values of R around the world and are there any resources we can use to see how it evolves over time? I am going to share what I learned but fair warning: I am not a doctor or an epidemiologist so please also do your own research using sources you trust.

Definition of R

R (Reproduction number) is a single value that represents how contagious a disease is and is calculated using an Epidemiology model.

Epidemiology models

An epidemiology model is a mathematical model (a set of equations) that takes into consideration that any population can be split into groups (called compartments) that exhibit a common trait. Some commonly used compartments are the:

  • Susceptible (part of the population that could become infected)
  • Infectious (part of the population that has been infected)
  • Recovered (part of the population that has recovered)

Other types compartments could be the Exposed (part of the population that has come in contact with a career but does not yet exhibit symptoms) or the Carriers (part of the population that has contracted a disease, no longer exhibit symptoms but can still transfer the disease as they are not fully recovered).

One epidemiology model that I came across a lot is called SEIR (Susceptible, Exposed, Infectious and Recovered) and it states that an individual can be in one of those four states at any point in time. People could move from one compartment to another as time passes which creates the state transitioning listed below

Source: SEIR model (Wikipedia)

Types of R

There are 2 types of R values that you may encounter on the news and in announcements (sometimes is not clear which one is used):

  • R0 ➡️ the Base Reproduction number
  • Rt ➡️ the Effective Reproduction number

There are 2 differences between R0 and Rt:

  • R0 considers the entire population to be Susceptible whereas Rt tries to account for the fact that not everyone in the population is Susceptible at any point in time (some people may already be immune).
  • R0 is not linked with a unit of time where as Rt is calculated at a specific point in time and it takes into account changes that have happened in the population like the number of people considered immune, the change to how much people get in contact, for how long and any protective measures that may be in use (like wearing a mask).
Source: Thomas V. Inglesby, MD (Inglesby, T.V., 2020)

Things to know about R

1 The values used to calculate R are approximations, since it’s impossible for anyone to state with absolute certainty the exact number of people in a country or region that have contracted a disease. For us to be able to say that for sure we would need to test everyone in the population at the same point in time and use those results in our calculations that would only ever be true for a limited amount of time anyways.

2 The compartments only make sense when we examine a population that is close together – say everyone in a single country or state. So R is really a localized metric and it is affected by things like how close by do people live, how many people someone gets in contact with and for how long. These factors are taken into consideration when the values are calculated for a specific region.

3 As mentioned above there are multiple different ways to calculate R depending on the model one uses and the model also defines parameters that approximate the factors mentioned in number 2 above. So whilst trying to understand what R is – it’s useful to use the same source as that would mean that you get a consistent calculation of the value (different methods may produce different number and thus creating an inconsistent view of the R value for you).

R around the world

Whilst reading about R I came across this site http://metrics.covid19-analysis.org/ that offers a daily updated version of the Rt values per country around the world. The site and associated data is developed by Xihong Lin’s Group in the Department of Biostatistics at the Harvard Chan School of Public Health.


The image of the world at the top of the post is the latest data at the time of writing this post. If you do visit the site do run the simulation they offer on their home page that shows how the daily values of Rt have been changing per country since March.

Navigating to the Forest Plot tab of the menu you can also get a view of the countries listed in descending order based on their Rt value. The dotted line is an Rt = 1 which means that the virus is somewhat contained (for now since the numbers can and have changed over time).

Source: http://metrics.covid19-analysis.org/
* Truncated list of results – for the full list visit the website and select the Forest Plot

The meaning of the values of R

R has positive values that stay (hopefully) in the tens, like 0.75 or 1 or 5. For measles for example the R0 was between 12 and 18, meaning that one infected person would on average infect 12-18 other people in a totally susceptible population. So how do the values of R translate to the longevity of a disease?

R < 1 ➡️ Eventually extinct: A value of <1 means that if nothing changes the disease will eventually become insignificant for the population.

R = 1 ➡️ Endemic: A value =1 means that the number of infections will remain stable and continue at a steady rate among a population

R > 1 ➡️ Epidemic: A value >1 means that there is an outbreak that may spread to a larger geographic area and could also spread worldwide, thus making it a Pandemic.

Final thoughts

My key takeaways after this are that when the R number is referenced in the news with regards to COVID-19, we are referring to the Effective Reproductive number Rt which is the one that evolves over time as we gain new data and insights.

The value is a calculated and estimated number using an epidemiological model (that is making some assumptions about the numbers – ie. we can’t really know the number of actual cases per day unless we test everyone in a given population).

The value can be either leading (running ahead of the actual numbers) or lagging (running behind the actual data) since in some cases we learn that someone is infected days after they contracted the disease. This means that we need to adjust the data that we hold and re-run our calculations.

All and all it may appear like R is a simple number but it holds many assumptions and approximations in it (since even the epidemiological model used could affect the actual value). So comparing R values calculated by different sources may lead to misunderstandings (if the models used are different) especially when trying to compare say different regions or countries.


Leave a Reply