Tip for data extraction for meta-analysis – 16

September 23, 2019

Estimating a hazard ratio from a Kaplan curve and numbers at risk

Kathy Taylor

In my last post, using trial data, I worked through the equations that underlie the spreadsheet calculator of Tierney et al to estimate a hazard ratio (HR) from Kaplan Meier (K-M) curves reported with information about follow-up. In this post I’ll look at the equations for the case of a K-M curves reported with numbers at risk.

Again I’d like to thank David Fisher (MRC Clinical Trials Unit, UCL) for his help in deriving the equations.

Here are my extracted data with the reported numbers at risk of mortality (Table 1) for the trial of two different peri-operative chemotherapy treatments for patients with gastric or gastro-oesophageal cancer. The treatment groups are abbreviated as FLOT (for the research, intervention group) and ECF/ECX (for the comparator group).

Table 1. Data for the FLOT4 trial

Time at start of interval (months) Survival (event-free) % Reported numbers at risk
FLOT ECF/ECX FLOT ECF/ECX
0 100 100 356 360
12 84 80 297 287
24 69 58 231 202
36 57 49 140 126
48 50 44 87 83
60 45 36 39 33
72 43 32 5 9

 

The authors highlight the advantage of these data offering a more direct way to assess censoring but the disadvantage is that there are fewer data points.

The spreadsheet calculations, which follow an actuarial life-table approach (Figure 1), estimate for each time interval and each treatment arm:

  1. Numbers at risk during the current interval
  2. Numbers of (patients with) events during the current interval
  3. Numbers censored in the current interval (involving a fractional equation)
  4. O-E, V and the HR for the current interval (2 estimations are given)These steps are repeated across all intervals and finally these statistics are combined to calculate:
  5. O-E, V and the HR for the whole survival curve.

Figure 1. Spreadsheet calculations

Having the numbers at risk ‘anchors’ the estimates at particular times, unlike the case of estimates made for K-M curves with information about follow-up where the time intervals are those reported and those chosen by the data extractor.

I will illustrate the use of the equations by showing the calculations for the first interval of the trial data (0-12 months).

equation 1

i.e.
(356+297) x 1.00/(1.00+0.84)=354.89 in the research group
(360+287) x 1.00/(1.00+0.80)=359.44 in the control group

STEP 2: Numbers of patients with events during the current interval

equation 2

i.e.
(356+297) x (1.00-0.84)/(1.00+0.84) = 56.78 in the research group
(360+287) x (1.00-0.80)/(1.00+0.80) = 71.89 in the control group

STEP 3: Numbers censored in the current interval

equation 3

i.e.
2x(356×0.84-297×1.00)/(1.00+0.84)=2.22 in the research group
2x(360×0.80-287×1.00)/(1.00+0.80)=1.11 in the control group

It is interesting to see that approximately 3 patients were censored in the period 0-12 months but these were not accounted for previously by estimating a minimum follow-up period of 15 months.

STEP 4a: Estimate HR and V for the current interval using the steps given previously
i.e. The HR is calculated as a relative risk for.

i.e. (56.78/354.89)/(71.89/359.44)=0.80

equation 4

A direct method to calculate the HR is

Taking natural logs of both sides and rearranging gives

O E = ln(HR) × V

STEP 4b: Estimate E and then O-E
The number of expected events in the current interval for the research group is estimated as the fraction with events multiplied by the number at risk in the research group:

equation 5

i.e.
E=(56.78+71.89) x 354.89/(354.89+359.64)=63.91

The difference between the observed and expected events in the research group is
O-E=56.78-63.91= -7.13

If the randomisation ratio is 1:1, estimate

equation 6

i.e.
(56.78+71.89)/4=32.17

If randomisation ratio is not 1:1 or the reported numbers at risk (Event free at start of intervals) are very different, use

equation 7

i.e.

The HR can be calculated directly as

i.e.
exp(-7.13/32.17)=0.80

STEP 6: O-E, V and the HR for the whole survival curve.

Accounting for all intervals, the HR for the whole curve is calculated

i.e

With V = 92.1, O-E= -21.7 and 95% CI of the HR is 0.64 to 0.97.

Here’s a tip…
You can calculate a hazard ratio from a survival curve reported with numbers at risk

Where did the equations come from?

(You can skip this if you are only interested in carrying out the calculations)

I’ll use shorter names of variables so the equations fit on a single line and only consider the equations for a single arm of a trial to simplify the notation. The same equations will apply to both treatment arms.

The standard K-M limit formula is

S2 = S1(1-d2 n2)

where

S subscript 1 is the survival (event-free) proportion at the start of time-points t subscript 1 / S subscript 2 is the survival (event-free) proportion at the start of the adjacent time-point T subscript 2 / There are no events nor patients censored between T subscript 1 and T subscript 2 / N subscript 2 is the number at risk just before time t subscript 2 / D subscript 2 is the number of events at time T subscript 2

We are not observing events or censoring directly so, the equation becomes

equation 8

where the stars indicate that the quantities were not observed at time-points corresponding to changes in the risk set.

D star over subscript 2 is the number of events since time t subscript 1 / c star over subscript 2 is the number of censorings since time t subscript 1 / n star over subscript 2 is now the number of patients at risk since time t subscript 1, adjusted for censoring.

The method assumes that numbers at risk are known at a regularly spaced set of timepoints (an actuarial, life-table approach) and there is a constant rate of censoring within each time period. As those censored are only at risk for part of the interval it is assumed that censoring has the equivalent effect of half of them being at risk for the whole period:

equation 9

Therefore, ‘fractional’ numbers at risk across the intervals are estimated rather than “effective” numbers as described previously.

The number at risk at the next timepoint, n2, is obtained by subtracting all the censored patients plus the estimated number of events since time t1; that is:

equation 10

Rearrange equation 8 to calculate

equation 11

Rearrange equation 10 to calculate D star over subscript 2 / Rearrange equation 9 for n star over subscript 2 / substitute d star over subscript 2 and n star over subscript 2 into equation 11 and then rearrange produces

which is equation 3.

Substitute equation 3 into equation 10 and rearrange produces

which is equation 2.

Substitute equation 2 into equation 11 and rearrange produces

which is equation 1.

To derive equations 6 and 7:

To simplify the notation, I’ll consider the whole follow-up period.

E1 is the expected number of events in the research group
E2 is the expected number of events in the comparator group
D1 is the number of events in the research group
D2 is the number of events in the comparator group
N1 is the number in the research group
N2 is the number in the comparator group

Tierney et al refer to a direct method of calculating the variance as

equation 12

As in equation 5

equation 13

equation 14

Substitute equations 13 and 14 into equation 12

This is equation 6.

If the randomisation ratio is 1:1 then N1 = N2 = N and equation 7 becomes

This is equation 8.

Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.

Follow me on Twitter @dataextips for updates on my blog, related news, and to find out about other examples of statistics being made more broadly accessible.

Do you need an accessible version of this post? Download the word document.

Leave a Reply

Your email address will not be published. Required fields are marked *

* Checkbox GDPR is required

*

I agree