Tip for data extraction for meta-analysis – 15
September 5, 2019
Estimating a hazard ratio from a Kaplan curve and information about follow-up
In my previous post I highlighted the paper by Tierney et al which describes how to estimate hazard ratios (HRs) from Kaplan Meier (K-M) curves and other time-to-event data. I also showed an example of the use of their spreadsheet calculator with the FLOT4 trial data. In this post I’ll going to look at the underlying equations for the case of K-M curves reported with information about follow-up and work through equations the FLOT4 trial data.
I’d like to thank David Fisher (MRC Clinical Trials Unit, UCL) for his help in deriving the equations.
Table 1. Data for the FLOT4 trial
|Time at start of interval (months)||Survival (event-free) %||Reported numbers at risk|
Table 1 shows my extracted data with the reported numbers at risk of mortality for the FLOT4 trial. This was a trial of two different peri-operative chemotherapy regimes in patients with gastric or gastro-oesophageal cancer. The treatment groups are abbreviated FLOT (for the research, intervention group) and ECF/ECX (for the comparator group).
The spreadsheet estimates for each time interval and each treatment arm (Figure):
- Numbers of patients at risk (without events) at the start of the current interval
- Numbers censored during the current interval
- Numbers at risk during the current interval, adjusted for censoring
- Numbers of (patients with) events during the current interval
- O-E, V and the HR for the current interval. These steps are repeated across all intervals and finally these statistics are combined to calculate:
- O-E, V and the HR for the whole survival curve.
Note that for the intervals up to the minimum follow-up time, no patients are censored.
Figure. Spreadsheet calculations.
Calculations are made from interval to interval, along all the time intervals which will include those reported and those chosen by the data extractor. This differs from the case of KM curves with numbers at risk (see my next post) where numbers at risk ‘anchors’ the estimates at particular times.
In my trial example I estimated the follow-up range of 15 to 80 months. We’re dealing with months as blocks of time, so a minimum follow-up of 15 months means that all patients had complete follow-up and no patients were censored up to the end of month 15, which is the end of the time interval 14-16 months. Censoring will occur from the beginning of month 16 onwards, starting in the interval 16-18. This is why I said previously that intervals should be chosen so that the assumed minimum follow-up period falls at the end of an interval.
I will look at 16-18 months, so this will be the current interval and 14-16 months will be the prior interval.
The equations for the prior interval are simpler to those in the current interval where censoring applies.
Equations for the prior interval (14-16 months)
Numbers at risk at the start of the prior interval is
Number randomised × Survival % at start of prior
356 x 0.80 = 284.8 in the research group
360 x 0.75 = 270.0 in the control group.
Numbers censored during the prior interval is assumed to be zero in both groups
Numbers of events in the prior interval is
Number randomised × (Survival % at start of prior – Survival % at end of prior)
356 x (0.80 – 0.78) = 7.12 in the research group
360 x (0.75 – 0.73) = 7.20 in the control group
Equations for the current interval (16-18 months)
STEP 1: Numbers at risk at the start of the current interval
These are the numbers at risk at the end of the prior interval.
At risk at start of current = At risk at start of prior – Events in prior – Censored in prior
284.8 – 7.12 – 0 = 277.68 in the research group
270.0 – 7.20 – 0 = 262.80 in the control group
STEP 2: Numbers of patients censored during the current interval
Assuming non-informative censoring (patients drop out for reasons unrelated to the study and at random), that censoring occurs at a constant rate within a given time interval, and using a simple estimate based on similar triangles described in the appendix of Parmar et al (and which also shows the maths!):
277.68 x 0.5 x (18-16)/(80-16) = 4.34 in the research group
262.80 x 0.5 x (18-16)/(80-16) = 4.11 in the control group
STEP 3: Numbers of patients at risk during the current interval, adjusted for censoring
The estimated number of censored patients are removed from those who are at risk at the start of the interval to calculate the “effective” numbers of patients at risk:
277.68 – 4.34 = 273.33 in the research group
262.80 – 4.11 = 258.69 in the control group
STEP 4: Numbers of patients with events during the current interval
A bit of maths (see below if you’re interested) shows that
273.33 x (0.78 – 0.76)/0.78 = 7.01 in the research group
258.69 x (0.73 – 0.69)/0.73 = 14.17 in the control group
STEP 5: O-E, V and the HR for the current interval
The HR is calculated as a relative risk as both time to event and censoring have been accounted for.
7.01/273.33 divided by 14.17/258.69 = 0.468
A bit of maths (see below if you’re interested) shows that
A direct method to calculate the HR is
Taking natural logs of both sides and rearranging gives
ln(0.468) x 4.86 = -3.69
STEP 6: O-E, V and the HR for the whole survival curve.
Accounting for all intervals, the HR for the whole curve is calculated
With V = 94.02, O-E = -23.47 and 95% CI of the HR is 0.64 to 0.95.
Here’s a tip…
You can calculate a hazard ratio from a survival curve reported with follow-up information but make sure, when extracting curve data, that the minimum follow-up time lies at the end of an interval.
In my next blog post, I’m going to look at the equations underlying the spreadsheet calculations for estimating a HR from a Kaplan Meier curve reported with numbers of patients at risk.
Where did the equations come from?
(You can skip this if you are only interested in carrying out the calculations)
To derive equation 2:
I’ll use shorter names of variables so the equations fit on a single line and only consider the equations for a single arm of a trial to simplify the notation. The same equations will apply to both treatment arms.
The standard K-M limit formula is
As we are not observing events or censoring directly, the equation becomes
where the stars indicate that the quantities were not observed at time-points corresponding to changes in the risk set.
Rearranging equation 5 becomes
which is equation 2.
To derive equation 3:
As the HR can be calculated as a relative risk, we can use the formula for the standard error of the log relative risk, SE(ln(RR)) i.e.
Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.
Follow me on Twitter @dataextips for updates on my blog, related news, and to find out about other examples of statistics being made more broadly accessible.
Do you need an accessible version of this post? Download the word document.