Modelling the models

April 3, 2020

Tom Jefferson, Carl Heneghan

Our readers may be puzzled by the profusion of predictions on the unfolding of the pandemic. So are we.

Those that have a mathematical basis are usually referred to as “models”.  According to Wikipedia “a model may help to explain a system and to study the effects of different components, and to make predictions about behaviour”. If applied to biomedicine and specifically to infectious diseases, models may help to understand the interactions of the different variables such as characteristics of the agent, target population, evolution of the spread and possible future scenarios.

All models, be they prospective or retrospective, if they are based on scientific principles have substantial uncertainty as to their starting point and are incompatible with oracle-like statements of certainty.

We reviewed 43 models indexed on PubMed, and 13 released by Imperial College (see google sheet here).  The impact of travel restrictions was often cited as negligible but then had significant impacts on the reproduction number, which varies substantially depending on the model chosen. Asymptomatic estimates and underreporting (or misclassification of cases) vary as does the doubling period, case fatality rates, and estimates of deaths.

Examples of Variation in reported models

  • The travel quarantine of Wuhan delayed the overall epidemic progression by only 3 to 5 days in Mainland China. [2]
  • Daily reproduction numbers in Wuhan declined from 2.35 (95% CI 1.15-4.77) 1 week before travel restrictions were introduced on Jan 23, 2020, to 1.05 (0.41-2.39) 1 week after. [1]
Reproduction number
  • Our estimate of the mean reproduction number in the confined setting reached values as high as 11.  [4]
  • The Maximum-Likelihood (ML) value of R0 was 2.28 for COVID-19 outbreak at the early stage on the ship.   [5]
  • The mean estimate of R0 for the 2019-nCoV ranges from 2.24 to 3.58. [6]
  • The basic reproduction number was estimated to be 2.1 (95% CI: 2.0, 2.2) and 3.2 (95% CI: 2.7, 3.7) for Scenarios 1 and 2, respectively. [13]
Choice of model
  • Among the four methods, the EG method fitted the data best. The estimated R(0) was 3.49 (95% CI: 3.42-3.58) by using the EG method.  [3]
Asymptomatics and undocumented cases
  • The estimated asymptomatic proportion was 17.9% (95% credible interval (CrI): 15.5-20.2%) [4,7]
  • We estimate 86% of all infections were undocumented. [8]
Control options
  • In most scenarios, highly effective contact tracing and case isolation are enough to control a new outbreak of COVID-19 within 3 months. [9]
  • In addition, by fitting the number of infections with a single-term exponential model, we report that the infection is spreading at an exponential rate, with a doubling period of 1.8 days.  [10]
  • Infection was assumed to be seeded in each country at an exponentially growing rate (with a doubling time of 5 days) from early January 2020. [11]
Case Fatality rates
  • For cases detected in Hubei, we estimate the CFR to be 18% (95% credible interval (11-81%). For cases detected in travellers outside mainland China, we obtain central estimates of the CFR in the range 1.2-5.6% depending on the statistical methods, with substantial uncertainty around these central values. [12]
  • The latest estimated values of the cCFR were 5.3% (95% CI: 3.5%, 7.5%) for Scenario 1 and 8.4% (95% CI: 5.3%, 12.3%) for Scenario 2.  [13]
Estimates  of deaths
  • 16th March. Even if all patients were able to be treated, we predict there would still be in the order of 250,000 deaths in GB, and 1.1-1.2 million in the US. [11]
  • The typical delay from infection to hospitalisation means there is a 2- to 3-week lag between interventions being introduced and the impact being seen in hospitalised case numbers. [11]
  • 28th March, five days after a lockdown in the UK began:  deaths resulting from the outbreak would be “aimed” at 20,000. [14]


Modern computing methods have made it easy to recalibrate and adjust previous models with tiny bits of data. Epidemics are nonlinear and chaotic, and models are only as good as the data they are based on, the limitations of which need to be clearly described. But limitations in the models are often not mentioned or discussed briefly.  As a consequence, all models should come with a warning.

Our readers know that in the maelstrom of information on COVID 19 there are few certainties and science, if given a chance, could thrive from this. Soothsayers flourish in times of crisis, science is hard.

Tom Jefferson is an Epidemiologist.

Disclosure statement is here

Carl Heneghan is Professor of Evidence-Based Medicine, Director of the Centre for Evidence-Based Medicine and Director of Studies for the Evidence-Based Health Care Programme. (Full bio and disclosure statement here)


Disclaimer:  the article has not been peer-reviewed and the sources cited should be checked. The views expressed in this commentary represent the views of the authors and not necessarily those of the host institution, the NHS, the NIHR, or the Department of Health. The views are not a substitute for professional medical advice.

  10. Cheng ZJ, Shan J. 2019 Novel coronavirus: where we are and what we know. Infection 2020;48:155–63.
  11. (accessed 2 April 2020)
  12. (accessed 2 April 2020)
  14. (accessed 2 April 2020)