Tip for data extraction for meta-analysis – 16
September 23, 2019
Estimating a hazard ratio from a Kaplan curve and numbers at risk
In my last post, using trial data, I worked through the equations that underlie the spreadsheet calculator of Tierney et al to estimate a hazard ratio (HR) from Kaplan Meier (K-M) curves reported with information about follow-up. In this post I’ll look at the equations for the case of a K-M curves reported with numbers at risk.
Again I’d like to thank David Fisher (MRC Clinical Trials Unit, UCL) for his help in deriving the equations.
Here are my extracted data with the reported numbers at risk of mortality (Table 1) for the trial of two different peri-operative chemotherapy treatments for patients with gastric or gastro-oesophageal cancer. The treatment groups are abbreviated as FLOT (for the research, intervention group) and ECF/ECX (for the comparator group).
Table 1. Data for the FLOT4 trial
|Time at start of interval (months)
||Survival (event-free) %
||Reported numbers at risk
The authors highlight the advantage of these data offering a more direct way to assess censoring but the disadvantage is that there are fewer data points.
The spreadsheet calculations, which follow an actuarial life-table approach (Figure 1), estimate for each time interval and each treatment arm:
- Numbers at risk during the current interval
- Numbers of (patients with) events during the current interval
- Numbers censored in the current interval (involving a fractional equation)
- O-E, V and the HR for the current interval (2 estimations are given)These steps are repeated across all intervals and finally these statistics are combined to calculate:
- O-E, V and the HR for the whole survival curve.
Figure 1. Spreadsheet calculations
Having the numbers at risk ‘anchors’ the estimates at particular times, unlike the case of estimates made for K-M curves with information about follow-up where the time intervals are those reported and those chosen by the data extractor.
I will illustrate the use of the equations by showing the calculations for the first interval of the trial data (0-12 months).
(356+297) x 1.00/(1.00+0.84)=354.89 in the research group
(360+287) x 1.00/(1.00+0.80)=359.44 in the control group
STEP 2: Numbers of patients with events during the current interval
(356+297) x (1.00-0.84)/(1.00+0.84) = 56.78 in the research group
(360+287) x (1.00-0.80)/(1.00+0.80) = 71.89 in the control group
STEP 3: Numbers censored in the current interval
2x(356×0.84-297×1.00)/(1.00+0.84)=2.22 in the research group
2x(360×0.80-287×1.00)/(1.00+0.80)=1.11 in the control group
It is interesting to see that approximately 3 patients were censored in the period 0-12 months but these were not accounted for previously by estimating a minimum follow-up period of 15 months.
STEP 4a: Estimate HR and V for the current interval using the steps given previously
i.e. The HR is calculated as a relative risk for.
A direct method to calculate the HR is
Taking natural logs of both sides and rearranging gives
O – E = ln(HR) × V
STEP 4b: Estimate E and then O-E
The number of expected events in the current interval for the research group is estimated as the fraction with events multiplied by the number at risk in the research group:
E=(56.78+71.89) x 354.89/(354.89+359.64)=63.91
The difference between the observed and expected events in the research group is
If the randomisation ratio is 1:1, estimate
If randomisation ratio is not 1:1 or the reported numbers at risk (Event free at start of intervals) are very different, use
The HR can be calculated directly as
STEP 6: O-E, V and the HR for the whole survival curve.
Accounting for all intervals, the HR for the whole curve is calculated
With V = 92.1, O-E= -21.7 and 95% CI of the HR is 0.64 to 0.97.
Here’s a tip…
You can calculate a hazard ratio from a survival curve reported with numbers at risk
Where did the equations come from?
(You can skip this if you are only interested in carrying out the calculations)
I’ll use shorter names of variables so the equations fit on a single line and only consider the equations for a single arm of a trial to simplify the notation. The same equations will apply to both treatment arms.
The standard K-M limit formula is
S2 = S1(1-d2 ⁄ n2)
We are not observing events or censoring directly so, the equation becomes
where the stars indicate that the quantities were not observed at time-points corresponding to changes in the risk set.
The method assumes that numbers at risk are known at a regularly spaced set of timepoints (an actuarial, life-table approach) and there is a constant rate of censoring within each time period. As those censored are only at risk for part of the interval it is assumed that censoring has the equivalent effect of half of them being at risk for the whole period:
Therefore, ‘fractional’ numbers at risk across the intervals are estimated rather than “effective” numbers as described previously.
The number at risk at the next timepoint, n2, is obtained by subtracting all the censored patients plus the estimated number of events since time t1; that is:
Rearrange equation 8 to calculate
which is equation 3.
Substitute equation 3 into equation 10 and rearrange produces
which is equation 2.
Substitute equation 2 into equation 11 and rearrange produces
which is equation 1.
To derive equations 6 and 7:
To simplify the notation, I’ll consider the whole follow-up period.
E1 is the expected number of events in the research group
E2 is the expected number of events in the comparator group
D1 is the number of events in the research group
D2 is the number of events in the comparator group
N1 is the number in the research group
N2 is the number in the comparator group
Tierney et al refer to a direct method of calculating the variance as
As in equation 5
Substitute equations 13 and 14 into equation 12
This is equation 6.
If the randomisation ratio is 1:1 then N1 = N2 = N and equation 7 becomes
This is equation 8.
Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.
Follow me on Twitter @dataextips for updates on my blog, related news, and to find out about other examples of statistics being made more broadly accessible.
Do you need an accessible version of this post? Download the word document.