The Centre for Evidence-Based Medicine develops, promotes and disseminates better evidence for healthcare.

June 3, 2019

*Kathy Taylor*

Previously, I showed a **step-by-step guide** and **worked example** of a trend estimation method for summarising categorical risk (quantile or dose-response) data, using the trend estimation method of **Greenland and Longnecker**, the STATA **glst command** and the R **dosresmeta command**. In my last post I also showed that you could deal with the problem unbounded limits of categories by imputing values derived from the ranges of other categories. In this post I will look at the problem of wanting a particular reference category which may not be the category that’s reported. I will present three different examples.

**Example 1 – switching the reference category **You may want to change the reference category from that with the lowest exposure to the category with the highest exposure. Looking again at the data from

*Table 1. Cumulative incidence data on body mass index and risk of atrial fibrillation*

To change the reference category to that with the highest exposure we need to divide all the hazard ratios (HRs) by 1.74 (the HR of the category with the highest exposure), divide all the lower confidence interval limits by 1.16 (the lower confidence limit of the highest exposure category) and divide all the upper confidence interval limits by 2.56 (the upper confidence limit of the highest exposure category). Note that you need to swop the upper and lower limits of the confidence intervals (Table 2) because the transformed lower limit become upper limits.

*Table 2. With highest exposure as reference category*

**Example 2 – separating data and switching the reference category if necessary**

Sometimes an inner category is the reference category, as in Table 3, which shows data from a **study** of weight change and risk of atrial fibrillation. In this case, the reference category divides the categories into weight gain and weight loss. It would not be appropriate to include weight gain and weight loss data in the same meta-analysis, so these data need to be analysed separately, with the reference category featuring in both analyses. Having separated the data, the reference category may be changed, if necessary, as shown in Example 1

*Table 3. Cumulative incidence data on weight change and risk of atrial fibrillation*

**Example 3 – setting the reference category when deriving relative risks from event data**

In cases where categorical data are reported with rates, unadjusted estimates of relative risks (RRs) may be estimated, and as part of this process, you can chose the reference category. A **study** which featured in **Perez et al **presented rates of the first major vascular event in a trial of simvastatin verses placebo for various baseline categories including those of total cholesterol <5.0, ≥ 5.0 and <6.0, and ≥6.0 mmol/L for categories 1, 2 and 3 respectively. In the intervention group, the event rates for categories 1, 2 and 3 were 360/2030 (18%), 744/3942 (19%) and 929/4297 (22%) respectively. You can estimate RRs from these data by using a generalised linear model function (glm) in STATA and the method of **Chêne and Thompson**. The data are read into STATA as shown below

Looking at the column TC1vs2 (the comparison between category 1 and category 2), the first row gives the number with events (event=1) in category 1. The second row gives the number with no event (event=0) in category 1. The next two rows give the numbers with events and without events for category 2. The reference category is indicated by level=1.

estimates the RR of category 2 compared to category 1 (reference) as 1.06 (0.95 to 1.19).

`glm event ib1.level [fweight = TC1vs2], fam(bin) link(log) nolog eform`

`glm event ib1.level [fweight = TC1vs3], fam(bin) link(log) nolog eform`

In the above commands is the dependent variable and

`event`

`level`

`fweight`

`fam(bin)`

`link(log)`

`nolog`

`eform`

`ib1.level`

The RRs together with the numbers of events and total patients for each category produce cumulative incidence data. Recall that I described different types of categorical data in an **earlier post**.

Here’s a tip…

When dealing with categorical risk data, it may be possible to switch or set the reference category

My next blog post will focus on situations where categorical risk data are incomplete.

*Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.*

*Follow updates on this blog and related news on Twitter @dataextips*

Do you need an accessible version of this post? **Download the word document**.