Tip for data extraction for meta-analysis – 8
A worked example using a trend estimation method to summarise categorical risk data
In my last post, I introduced a 5-step guide to summarising categorical risk data for a single study, using the trend estimation method of Greenland and Longnecker, the STATA
glst command of Orsini et al, or the
dosresmeta R command of Crippa and Orsini. Here I’ll show a worked example based on a study by Grundvold et al (2012) who report hazard ratios (HRs) for the association between body mass index (BMI) and atrial fibrillation for three categories of BMI: BMI <25 kg/m2 (which they classed as normal weight); BMI 25 to 27.9 (overweight); and BMI ≥28 (obese).
I’ll first go through the 5 steps using STATA, and then provide the corresponding R code. I’ll end this post by presenting a typical scenario to illustrate the usefulness of the trend estimation method.
1 – Applying the trend estimation method in STATA.
glst package needs to be installed in STATA, and installing packages is usually done first. Use the following command to find the link to obtain the
glst package and follow the installation instructions.
Installation only needs to happen once, so delete or comment out this line (start the line with
*) once you installed
STEP 1: Establish the type of data.
Here’s the data in EXCEL
These categorical data show an increasing risk of atrial fibrillation associated with increasing BMI. The study reports the number of events of patients experiencing incident atrial fibrillation (cases) and total number of subjects (n) are also given for each category. These are cumulative incidence data.
To import the data stored as an Excel file into STATA:
import excel "L:\Blog data extraction\Other files\trendest.xlsx", ///
sheet("Sheet1") cellrange(A1:J4) firstrow clear
To import the data stored as a comma-spaced values (csv) file into STATA:
import delimited "L:\Blog data extraction\Other files\trendest.csv", clear
STEP 2: Set the average exposure for each category
In this example, the exposure is BMI. I’ll estimate the average BMI as the midpoint of each category. I first need to impute sensible values for the unbounded (unreported) limits of the outer categories.
I estimate the ranges of the BMIs of the outer categories to be twice the range of BMI of the inner category. You could run sensitivity analyses to explore the use of other multiples.
To carry out the above in STATA, first calculate the range of the inner category, and store this value in a new variable
range2 by using the saved feature
r(mean) of the
summarize command (I’m using the abbreviation,
replace range=(max-min) if category==2
summ range if category==2
gen range2=r(mean)The range of the inner category has value of 2.9 kg/m2
Then create a variable for the multiplier,
mult, and use it and the range of the inner category,
range2, to impute values for the unreported limits of the outer categories:
replace min=max - mult*range2 if category==1
replace max=min + mult*range2 if category==3
The average exposures of the categories can then be calculated as the midpoint BMIs:
STEP 3: For each category calculate the change in exposure from that of the reference group
The reference category has a HR of 1. The change in exposure is calculated as the difference in the average exposures from that of the reference category. Again, use the saved feature of
summarize to create the variable,
average0, for the average value of the reference category:
summ average if category==1
gen change=average - average0
STEP 4: Apply the trend estimation method
Log-transform the hazard ratio and its confidence interval and calculate the standard error for each category:
gen double se=(logub-loglb)/(2*invnormal(0.975))
We’ve established that we have cumulative incidence data, so I use the
ci option in
glst loghr change, se(se) cov(n cases) ci
Running this command produces the following output:
STEP 5: Calculate the linear trend
I want to exponentiate (back-log transform) the STATA output to calculate the HR on the continuous scale and rescale to an increase of 5 kg/m2 of BMI. I showed how to rescale hazard ratios in an earlier blog post, but here I’ll use the appropriate
lincom command to carry out both the exponentiation and rescaling:
lincom change*5, hr
So the categorical HRs for Grundvold 2012 are converted into a HR on a continuous scale of 1.30 (1.05 to 1.60), which indicates a 30% increased risk of atrial fibrillation associated with a 5 kg/m2 increase in BMI.
2. Applying the trend estimation method using R (code and output)
# Install the dosresmeta package
# This only needs to be done once and then commented out
# Load library
# This needs to be done every time you run this program
# STEP 1: Establish the type of data.
# Load and look at the data
mydata<-read.csv("L:/Blog data extraction/Other files/trendest.csv", header=TRUE, sep=",")
# STEP 2: Set the average exposure for each category
# Calculate and save the range of the 2nd (inner) category
# Set the multiplier
# Impute values for the unbounded limits
# Calculate the midpoints of the categories
# STEP 3: For each category calculate the change in exposure from that of the reference group
# Extract the average of the reference category (1st category)
# Calculate the change from the reference category
# STEP 4: Apply the trend estimation method of dosresmeta
mod.ci<-dosresmeta(formula=logrr~change, type="ci", cases=cases, n=n, lb=lb, ub=ub, data=mydata)
# STEP 5: Calculate the linear trend
predict(mod.ci, delta=5, exp=TRUE)
You will notice that the output in R is slightly different to that of STATA, but produces the same results quoted to 2 decimal places with HR 1.30 (1.05 to 1.60) for a 5 kg/m2 increase in BMI. Slight differences between the output of different computing packages is not unusual. I found that
glst produced identical results using the incidence rate and case-control sample datasets that are provided with these packages and slight differences with the cumulative incidence sample dataset.
3. Using the trend estimation method to pool data reported in different forms
The trend estimation method is very useful it can be applied to derive continuous estimates from different sets of categorical risk data. This enables the pooling of diverse categorical data with data expressed on the continuous scale. Let me present a scenario to illustrate.
I’ve already looked at Grundvold et al (2012), who provide categorical risk data for three categories of BMI. Two other studies which report the association of BMI with incident atrial fibrillation include one by Grundvold et al (2015), who report categorical risk data for quintiles of BMI, and another by Berkovitz et al (2015), who give a HR on a continuous scale. They reported that each unit increment of BMI was associated with an increased risk of 4.3% of the development of atrial fibrillation (HR 1.04, 95% CI 1.02 to 1.07).
Applying the trend estimation method to the data from Grundvold et al (2015) estimates a HR of 1.48 (1.32 to 1.68) i.e. a 48% increased risk in atrial fibrillation associated with a 5 kg/m2 increase in BMI. Scaling up the data from Berkovitch et al (2015) from a 1 kg/m2 to a 5 kg/m2 increase in BMI gives a HR of 1.23 (1.10 to 1.40) i.e a 23% increased risk of atrial fibrillation. Data from all three studies have now been converted into a common form (Figure) and are ready to pool in meta-analysis.
Figure. Converting varied risk data into a common form for meta-analysis
Here’s a tip…
Using the trend estimation method enables you to pool categorical risk data with data expressed on a continuous scale
In this post I looked at how you may deal with the problem of unbounded limits in categorical risk (dose response) data. In my next blog post I’ll show how to deal with other problems that may arise with categorical risk (dose response) data.
Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.
Follow updates on this blog and related news on Twitter @dataextips