COVID-19: The Chain Ladder Method to estimate deaths not yet reported
May 6, 2020
Stavros Christofides, Jason Oke, Carl Heneghan
NHS England releases daily, the number of deaths reported in Hospitals. Most of these reported deaths occurred in the previous week but some occurred more than a week ago – in some cases a month can elapse before they are reported.
This means that the counts of deaths are always an underestimate of the number of people who have died up to date of reporting. The difference between what the final estimate will be on any given date and the current figure can be defined as the deaths that have Occurred but not yet Reported (OBNR). This could be a useful measure in monitoring the ongoing pandemic as well as providing a better indication of recent trends.
The OBNR can be estimated using a technique called the Chain Ladder. This is used by actuaries to estimate incurred but not reported claims by using patterns of past claims.
The principle behind the Chain Ladder method is to use the way in which the counts have accumulated in the past to predict how they will be reported in the future and to fill in the missing (future) values.
The Chain Ladder method:
Arrange the cumulative counts of death by date of death in a spreadsheet, with rows corresponding to the date of death, and the delay (in days) in reporting for the columns. Table 1 shows the top right-hand corner of such a spreadsheet. The cells represent the total number of deaths reported for a specific date for 1 to 30 days later.
For example, the count of deaths on the 1st April and reported by the 26th April was 594 (top left-hand corner). On the 27th April, this number had increased by 2, and by 2 again on the 28th etc. By the 30th April, the total figure for deaths reported as having occurred on the 1st April was 602.
Table 1: Top right-hand corner of the Chain-ladder spreadsheet.
|Delay in reporting|
|Date of Death||26||27||28||29||30|
We can see that the number of deaths for the 1st April after 30 days (602) is higher than the cumulative total based after day 29 (601) by a factor of (602/601 = 1.00017 or 0.17%). We could use this information to “inflate” the estimate for the 30-day delay count for the 2nd April number.
However, the count after 30 days (602) is also probably an underestimate because in rare cases the delay in reporting can exceed a month. We should anticipate this number will also eventually be higher than the current figure. For simplicity, based roughly on the observed numbers of such cases, we have assumed this inflation factor to be 0.8% for all days > 30. If follows then, that the day 29 inflation factor should be 0.8% * 0.17% = 1.0%. This is the chain in the Chain Ladder method.
Starting from the right-hand corner of the spreadsheet we create a cumulative inflation factor in a chain all the way through to the left-hand side of the sheet. The table shows how these calculations would look for the first five days of April.
Table 2. Worked example of the Chain Ladder method.
|Delay in reporting|
|Date of Death||26||27||28||29||30|
|Sum (all days except current day)||2615||1856||598 + 602 = 1200||601||602|
|Sum of previous day||2622||1860||601 + 603 = 1204||602|
|Inflation Factor||0.3%||0.2%||0.3%||602/601 = 0.2%||0.8%|
|Cumulative inflation factor||1.8%||1.6%||1.3%||0.2% * 0.8% = 1.0%||0.8%|
|Ultimate estimate||740||778||669||603.5 * 1.01 = 609||602 * 1.008 = 607|
|Number not yet reported||13||12||9||6||607 – 602 = 5|
Occurred but not reported (OBNR)
We have projected COVID-19 Deaths in English Hospitals for the whole month of April 2020 using the Chain Ladder method (see figure 1).
The latest estimate of cases that are in the reporting pipeline is below 1,500 deaths with a standard error of approximately 120 deaths. Based on a stochastic version of the Chain Ladder using Bootstrap resampling (England & Verrall 1999) the upper bound of the 90% confidence interval for the OBNR is approx. 1,650 deaths. As the results indicate, with a consistent reduction in ultimate daily hospital deaths the OBNR figure is expected to decrease further. In anticipation of this, we intend to update this analysis once a week.
As mentioned above we used a non-parametric bootstrap technique (England & Verrall) to get an estimate also of the reliability of the most recent days. Table 3 shows the mean OBNR over all bootstrap samples, with standard errors and the upper bound of the 90% confidence interval estimated from the percentiles of the bootstrap samples for the last five days of April.
As we would expect there is greater uncertainty around the most recent estimates.
|Date of Death||Mean OBNR||Standard error||Ultimate or final number of deaths||Upper bound of 90% confidence interval for final estimate|
The underlying assumption is that the reporting pattern is stable over time and the existing history can be used to estimate this pattern and “populate” the table. There is no allowance for any changes in the reporting pattern in these calculations.
We have also not made any allowance for day effects such as the fact that registrars offices are closed on Saturday/Sunday and deaths occurring on these days are subject to reporting delays. In addition, we have not accounted for the changes in the rules surrounding the completion of the death certificate following the Coronavirus Act 2020 (see here).
For the uncertainty estimates for specific days we have assumed that the reporting pattern over the last ten or so days is consistent over this period and changes may have a significant impact on these projections. Patterns for COVID-19 Deaths in Care Homes and in the Community may well have different patterns of reporting and the same applies for deaths in Wales, Scotland and Northern Ireland.
The most recent days estimates now show a consistent pattern and the number of late reported cases are reducing as the overall volumes of daily deaths have reduced at least in these hospitals. Using this method, the upward trend to the peak of the 8th April is just under 38 deaths per day and the daily reduction since then is a fairly steady drop of just over 23 cases per day. Our finding suggests that deaths occurring in April will continue to be reported in May and add to the number of deaths already recorded from the pandemic.
England,P.D. & Verrall,R.J.(1999).Analytic and bootstrap estimates of prediction errors in claims reserving Insurance : Mathematics and Economics,25,281-293.
With thanks to Richard Kelsey for commenting on the manuscript.
Stavros Christofides B.Sc, M.Phil worked in Insurance as a Manager and as a Consultant and has published papers on Loss Reserving, Insurance Pricing and Dynamic Financial Analysis. He was also an Honorary Visiting Fellow at Cass Business School, City University.
Jason Oke is a Senior Statistician at the Nuffield Department of Primary Care Health Sciences and Module Coordinator for Statistical Computing with R and Stata (EBHC Med Stats), and Introduction to Statistics for Health Care Research (EBHC), as part of the Evidence-Based Health Care Programme.
Carl Heneghan is Professor of Evidence-Based Medicine, Director of the Centre for Evidence-Based Medicine and Director of Studies for the Evidence-Based Health Care Programme. (Full bio and disclosure statement here)
Disclaimer: the article has not been peer-reviewed; it should not replace individual clinical judgement, and the sources cited should be checked. The views expressed in this commentary represent the views of the authors and not necessarily those of the host institution, the NHS, the NIHR, or the Department of Health and Social Care. The views are not a substitute for professional medical advice.