The Centre for Evidence-Based Medicine develops, promotes and disseminates better evidence for healthcare.

March 6, 2020

*Kathy Taylor*

**Previously**, I highlighted a list of ways where, when extracting data for meta-analysis of continuous outcomes, you might find that a summary statistic that you want is missing. In this post I’ll focus on the 4^{th} way – **neither the summary statistic you want nor a similar statistic is reported. **This can arise in several different ways.

**Sample sizes are not reported**** **

Sometimes studies report the total number of patients and not numbers for each treatment group. The group equations that I showed **before** can’t be used as insufficient information is reported. However, these studies can be included in a meta-analysis using the generic inverse method **(see section 10.3 in the Cochrane Handbook)**, where data are entered in the form of the appropriate effect estimate (for example, the mean difference) and its standard error (SE). For the study with missing sample sizes, the SE will be missing but this can be imputed. Imputation involves ‘filling in’ with a sensible value, such as the average SEs of the same treatment arms of other studies.

**Missing mean and no other average measure**

If a study has missing mean and the median also is not reported, the methods of Hozo, Bland and Wan that I mentioned **previously** cannot be used. A study that doesn’t report a mean may only report an effect estimate. This situation will be covered in my next blog post.

You may find that, instead of a mean, a study has reported a percentage change from baseline. If baseline values are also reported, you can calculate the mean final value:

You will need to impute the SD.

**Parving 2001** reported that the urinary albumen excretion rate (UAER) reduced by 38% (32% to 40%) in the 300mg irbesartan treatment group, 24% (19% to 29%) in the 150mg irbesartan treatment group, and by 2% (-7% to 5%) in the placebo group. Baseline UAER values reported as 53.4 (2.2), 58.3(2.7) and 54.8 (2.5) µg/min respectively. We estimate the mean final urinary albumen excretion rates as:

53.4 – 53.4 x 0.38 =53.4 x 0.67=35.8

58.3 – 58.3 x 0.24 =58.3 x 0.76=44.3

54.8 – 54.8 x 0.02 =54.8 x 0.98=53.7

In a future post I’ll look at the case where you want to pool final values but a study reports a percentage change and does not report baseline values.

**Missing standard deviation and no other measure of variability**

**The Cochrane Handbook (6.5.2.3)** shows that within group SDs may be calculated from summary statistics of a mean difference (MD). The MD, for which the more correct term is the **difference** **of means (6.5.1.1)**, is the absolute difference between the mean values of a particular variable of the two groups in a randomised clinical trial.

*Calculating a within-group SD from a SE of a MD:*

Note that this SD is the average of the SDs of the two groups and so it this same SD should be inputted into the meta-analysis for both groups.

*Calculating a within-group SD from a CI of a MD:*

A SE of a MD can be calculated from CI of the MD, as shown **previously**,

For large samples (The Cochrane Handbook recommend this to be at least 60 in each group), the denominator (D) for MDs will be 3.92 for 95% CIs, 3.29 for 90% CIs and for 99% CIs. The denominators are the Z values from **standard** **normal tables**, which I showed before (see ‘Where did the equations come from?’). For small samples, CIs for MDs should have been calculated from t-distributions and the denominators should therefore be the t-values from **a t-distribution table** which I used before.

Then having calculated the SE of the MD, the within group SD can be calculated from the SE, as shown above.

*Calculating a within-group SD from p-value for a MD:*

A SE of a MD may be calculated from a p-value by finding the associated t-value, taken from a t-distribution table.

For example, consider a trial with 20 participants in the intervention group 22 in the control group and a p-value of 0.01. We assume that this is a 2-sided probability.

dof = 20+22-2=40

From the t distribution table (Figure 1), the t-value is 2.704

You can also find the t-value from typing into an EXCEL cell

=TINV.2T(0.01,40).

Then having calculated the SE of the MD, the within group SD can be calculated from the SE, as shown above.

Note that if only p-value<0.05 is reported, the Cochrane Handbook suggest a conservative approach by using the upper limit i.e. p value=0.05. However, if p-value=NS (not significant) is reported we assume p-value>0.05 and we cannot calculate a SE, so we have to use imputation.

*Dealing with missing SDs with imputation*

If a large number of studies have no measure of variability, pooling data is not recommended. If only a small proportion of studies have no variability measure, and these studies will only contribute a small proportion of the data, you can deal with missing SDs by imputation, either using those included in your review, or from other meta-analyses. All the ‘lending SD’s should be similar and so it might be more appropriate to use the same-treatment SDs from that which is missing.

You could substitute the missing SD with a **weighted** average of SD from other studies

This makes use of (n-1) that features in the calculation of the SD. This is **Bessel’s correction** which corrects for bias.

Alternatively you should impute a SD with an unweighted average

or take a conservative approach and substitute the missing SD with the highest valued available SD, as this will result on the lower weight given to the study.

More complicated imputation approaches include **regressing the SDs** of the same treatment from other studies onto other study covariates that are understood to be related to the missing SD. for example,

The Cochrane Handbook highlights **Marinho et al** who, in their review of the preventative effect of fluoride toothpaste, dealt with missing data by predicting SDs from a linear regression of log(SD) on log(mean), citing the methods of the earlier review by **van Rijkom et al** to justify their use of a regression model.

Here’s a tip…

You can use imputation to deal with missing sample sizes, means and SDs, using reported data or data from other studies.

In my next post, I’ll focus on another example of the **4 ^{th} way** of how a summary statistic that you want may be missing when dealing with continuous outcomes:

**Where did the equations come from? **(You can skip this if you are only interested in carrying out the calculations)

*Calculating a within-group SD from a SE of a MD:*

In a **previous proof** I showed

For the proof of** my last post **I explained that the SE gives an estimate of the SD of its sampling distribution and that

Where s is the sample standard deviation and we assume that the two sample standard deviations are equal. Therefore,

*Dr Kathy Taylor teaches data extraction in Meta-analysis, This is a short course that is also available as part of our MSc in Evidence-Based Health Care *

**MSc in Medical Statistics **and **MSc in Systematic Reviews **

Follow me on Twitter **@dataextips **for updates on my blog, related news, and to find out about further examples where others, like me, are trying to make statistics more broadly accessible.

A full directory of blog posts can be found at ** https://www.cebm.net/2014/06/data-extraction-in-meta-analysis/**