The Centre for Evidence-Based Medicine develops, promotes and disseminates better evidence for healthcare.
September 23, 2019
Kathy Taylor
In my last post, using trial data, I worked through the equations that underlie the spreadsheet calculator of Tierney et al to estimate a hazard ratio (HR) from Kaplan Meier (K-M) curves reported with information about follow-up. In this post I’ll look at the equations for the case of a K-M curves reported with numbers at risk.
Again I’d like to thank David Fisher (MRC Clinical Trials Unit, UCL) for his help in deriving the equations.
Here are my extracted data with the reported numbers at risk of mortality (Table 1) for the trial of two different peri-operative chemotherapy treatments for patients with gastric or gastro-oesophageal cancer. The treatment groups are abbreviated as FLOT (for the research, intervention group) and ECF/ECX (for the comparator group).
Table 1. Data for the FLOT4 trial
Time at start of interval (months) | Survival (event-free) % | Reported numbers at risk | ||
FLOT | ECF/ECX | FLOT | ECF/ECX | |
0 | 100 | 100 | 356 | 360 |
12 | 84 | 80 | 297 | 287 |
24 | 69 | 58 | 231 | 202 |
36 | 57 | 49 | 140 | 126 |
48 | 50 | 44 | 87 | 83 |
60 | 45 | 36 | 39 | 33 |
72 | 43 | 32 | 5 | 9 |
The authors highlight the advantage of these data offering a more direct way to assess censoring but the disadvantage is that there are fewer data points.
The spreadsheet calculations, which follow an actuarial life-table approach (Figure 1), estimate for each time interval and each treatment arm:
Figure 1. Spreadsheet calculations
Having the numbers at risk ‘anchors’ the estimates at particular times, unlike the case of estimates made for K-M curves with information about follow-up where the time intervals are those reported and those chosen by the data extractor.
I will illustrate the use of the equations by showing the calculations for the first interval of the trial data (0-12 months).
equation 1
i.e.
(356+297) x 1.00/(1.00+0.84)=354.89 in the research group
(360+287) x 1.00/(1.00+0.80)=359.44 in the control group
STEP 2: Numbers of patients with events during the current interval
equation 2
i.e.
(356+297) x (1.00-0.84)/(1.00+0.84) = 56.78 in the research group
(360+287) x (1.00-0.80)/(1.00+0.80) = 71.89 in the control group
STEP 3: Numbers censored in the current interval
equation 3
i.e.
2x(356×0.84-297×1.00)/(1.00+0.84)=2.22 in the research group
2x(360×0.80-287×1.00)/(1.00+0.80)=1.11 in the control group
It is interesting to see that approximately 3 patients were censored in the period 0-12 months but these were not accounted for previously by estimating a minimum follow-up period of 15 months.
STEP 4a: Estimate HR and V for the current interval using the steps given previously
i.e. The HR is calculated as a relative risk for.
i.e. (56.78/354.89)/(71.89/359.44)=0.80
equation 4
A direct method to calculate the HR is
Taking natural logs of both sides and rearranging gives
O – E = ln(HR) × V
STEP 4b: Estimate E and then O-E
The number of expected events in the current interval for the research group is estimated as the fraction with events multiplied by the number at risk in the research group:
equation 5
i.e.
E=(56.78+71.89) x 354.89/(354.89+359.64)=63.91
The difference between the observed and expected events in the research group is
O-E=56.78-63.91= -7.13
If the randomisation ratio is 1:1, estimate
equation 6
i.e.
(56.78+71.89)/4=32.17
If randomisation ratio is not 1:1 or the reported numbers at risk (Event free at start of intervals) are very different, use
equation 7
i.e.
The HR can be calculated directly as
i.e.
exp(-7.13/32.17)=0.80
STEP 6: O-E, V and the HR for the whole survival curve.
Accounting for all intervals, the HR for the whole curve is calculated
i.e
With V = 92.1, O-E= -21.7 and 95% CI of the HR is 0.64 to 0.97.
Here’s a tip…
You can calculate a hazard ratio from a survival curve reported with numbers at risk
(You can skip this if you are only interested in carrying out the calculations)
I’ll use shorter names of variables so the equations fit on a single line and only consider the equations for a single arm of a trial to simplify the notation. The same equations will apply to both treatment arms.
The standard K-M limit formula is
S_{2} = S_{1}(1-d_{2 }⁄ n_{2})
where
We are not observing events or censoring directly so, the equation becomes
equation 8
where the stars indicate that the quantities were not observed at time-points corresponding to changes in the risk set.
The method assumes that numbers at risk are known at a regularly spaced set of timepoints (an actuarial, life-table approach) and there is a constant rate of censoring within each time period. As those censored are only at risk for part of the interval it is assumed that censoring has the equivalent effect of half of them being at risk for the whole period:
equation 9
Therefore, ‘fractional’ numbers at risk across the intervals are estimated rather than “effective” numbers as described previously.
The number at risk at the next timepoint, n_{2}, is obtained by subtracting all the censored patients plus the estimated number of events since time t_{1}; that is:
equation 10
Rearrange equation 8 to calculate
equation 11
which is equation 3.
Substitute equation 3 into equation 10 and rearrange produces
which is equation 2.
Substitute equation 2 into equation 11 and rearrange produces
which is equation 1.
To derive equations 6 and 7:
To simplify the notation, I’ll consider the whole follow-up period.
E_{1 }is the expected number of events in the research group
E_{2 }is the expected number of events in the comparator group
D_{1 }is the number of events in the research group
D_{2 }is the number of events in the comparator group
N_{1 }is the number in the research group
N_{2 }is the number in the comparator group
Tierney et al refer to a direct method of calculating the variance as
equation 12
As in equation 5
equation 13
equation 14
Substitute equations 13 and 14 into equation 12
This is equation 6.
If the randomisation ratio is 1:1 then N_{1 }= N_{2 }= N and equation 7 becomes
This is equation 8.
Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.
Follow me on Twitter @dataextips for updates on my blog, related news, and to find out about other examples of statistics being made more broadly accessible.
Do you need an accessible version of this post? Download the word document.