# Tip for data extraction in meta-analysis – 2

February 21, 2019

## What if the prevalence is also not reported?

Kathy Taylor

In my last postI showed some equations that you can use to construct the 2×2 diagnostic accuracy classification table using the reported prevalence, study size, sensitivity and specificity. Sometimes the prevalence is only reported for the whole study and not for subgroups. Recall that the prevalence is the proportion of those in the group of interest (whole study population or a subgroup) who have the disease. If the size of the subgroup and the number in the subgroup who have the disease is known:

If the size of the subgroup and the number in the subgroup who have the disease are not known, the prevalence may still be calculated if the positive predictive value (PPV) or the negative predictive value (NPV) are provided with the sensitivity and specificity. The PPV is the proportion of those who test disease-positive (1st row in Table given in my last post) who have the disease. The NPV is the proportion of those who test disease-negative (2nd row) who do not have the disease. The PPV and NPV are often reported with the sensitivity and specificity.

A bit of maths (see below if you’re interested) shows us:

Or

You have to be very careful when doing calculations that use brackets as it’s easy to overlook a bracket or put it in the wrong place when you type up these equations in a calculator, spreadsheet or computer program. Also note that statistics, such as sensitivity, can be reported as percentages or as decimal fractions. We’re assuming that all these statistics are reported as fractions.

Let me show you an example from a review that I’m currently working on. In a study of the detection of asymptomatic left ventricular dysfunction, using N-terminal pro-brain natriuretic peptide, measured with the Elecsys Modular E device, at a threshold of 125 pg/mL and compared to echocardiography, for males aged under 67 years, the sensitivity is reported at 85.7%, specificity at 92.9%, NPV at 99.5% and the PPV at 30.0%. Prevalence is not reported for this subgroup but it is reported that, in this subgroup, 7 patients have diastolic dysfunction and 196 have normal left ventricular function. Therefore:

We may also use the other equations to calculate the prevalence, perhaps to check that the all the data makes sense. First note that this study reports percentage inputs, so we first need to convert these to decimal fraction inputs by dividing the percentages by 100 (e.g. sensitivity of 85.7% becomes 0.857).

Using the PPV

Using the NPV

The slight differences arise from rounding errors.

Here’s a tip…

The prevalence may be calculated in several different ways using other reported statistics.

In the next post I’ll explain what you might do if a sensitivity or specificity is not reported.

### Where did the equations come from?

(You can skip this if you are only interested in carrying out the calculations)

Previously, I showed the following equations:

TP=Sensitivity x Prevalence x Total
FN=Prevalence x Total x (1-Sensitivity)
TN=Specificity x Total x (1-Prevalence)
FP=Total x (1-Prevalence) x (1-Specificity)

I stated above that the PPV is the proportion of those who test disease-positive (1st row in Table given previously) who have the disease i.e.

(equation 1)

and the NPV is the proportion of those who test disease-negative (2nd row) who do not have the disease.

(equation 2)

To calculate prevalence using the PPV

Substitute for TP and FP in equation 1

Cancel out Total and rearrange

To calculate prevalence using the NPV

Substitute for TN and FN in equation 2

Cancel out total and rearrange

Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.