Tip for data extraction in meta-analysis – 2

February 21, 2019

What if the prevalence is also not reported?

Kathy Taylor

In my last postI showed some equations that you can use to construct the 2×2 diagnostic accuracy classification table using the reported prevalence, study size, sensitivity and specificity. Sometimes the prevalence is only reported for the whole study and not for subgroups. Recall that the prevalence is the proportion of those in the group of interest (whole study population or a subgroup) who have the disease. If the size of the subgroup and the number in the subgroup who have the disease is known:

Prevalence equals number with the disease in the subgroup over total in the subgroup

If the size of the subgroup and the number in the subgroup who have the disease are not known, the prevalence may still be calculated if the positive predictive value (PPV) or the negative predictive value (NPV) are provided with the sensitivity and specificity. The PPV is the proportion of those who test disease-positive (1st row in Table given in my last post) who have the disease. The NPV is the proportion of those who test disease-negative (2nd row) who do not have the disease. The PPV and NPV are often reported with the sensitivity and specificity.

A bit of maths (see below if you’re interested) shows us:

Prevalence equals PPV times (1 - Specificity) over Sensitivity x (1 - PPV) + PPV x (1 - Specificity)

Or

Prevalence = Specificity x (1 - NPV) over NPV x (1 - Sensitivity) + Specificity x (1 - NPV)

You have to be very careful when doing calculations that use brackets as it’s easy to overlook a bracket or put it in the wrong place when you type up these equations in a calculator, spreadsheet or computer program. Also note that statistics, such as sensitivity, can be reported as percentages or as decimal fractions. We’re assuming that all these statistics are reported as fractions.

Let me show you an example from a review that I’m currently working on. In a study of the detection of asymptomatic left ventricular dysfunction, using N-terminal pro-brain natriuretic peptide, measured with the Elecsys Modular E device, at a threshold of 125 pg/mL and compared to echocardiography, for males aged under 67 years, the sensitivity is reported at 85.7%, specificity at 92.9%, NPV at 99.5% and the PPV at 30.0%. Prevalence is not reported for this subgroup but it is reported that, in this subgroup, 7 patients have diastolic dysfunction and 196 have normal left ventricular function. Therefore:

Total in the subgroup = 7 + 196 = 203 / Prevalence = 7 over 203 = 0.034 or 3.4%

We may also use the other equations to calculate the prevalence, perhaps to check that the all the data makes sense. First note that this study reports percentage inputs, so we first need to convert these to decimal fraction inputs by dividing the percentages by 100 (e.g. sensitivity of 85.7% becomes 0.857).

Using the PPV

Prevalence = 0.3 x (1 - 0.929) over 0.857 x (1 - 0.3) + 0.3 x (1 - 0.929) = 0.0213 over 0.6338 = 0.034 or 3.4%

Using the NPV

Prevalence = 0.929 x (1 - 0.995) over 0.995 x (1 - 0.857) + 0.929 x (1 - 0.995) = 0.004645 over 0.14693 = 0.032 or 3.2%

The slight differences arise from rounding errors.

Here’s a tip…

The prevalence may be calculated in several different ways using other reported statistics.

In the next post I’ll explain what you might do if a sensitivity or specificity is not reported.

 

Where did the equations come from?

(You can skip this if you are only interested in carrying out the calculations)

Previously, I showed the following equations:

TP=Sensitivity x Prevalence x Total
FN=Prevalence x Total x (1-Sensitivity)
TN=Specificity x Total x (1-Prevalence)
FP=Total x (1-Prevalence) x (1-Specificity)

I stated above that the PPV is the proportion of those who test disease-positive (1st row in Table given previously) who have the disease i.e.

PPV = TP over TP + FP
(equation 1)

 

and the NPV is the proportion of those who test disease-negative (2nd row) who do not have the disease.

NPV = TN over TN + FN


(equation 2)     

 

To calculate prevalence using the PPV

Substitute for TP and FP in equation 1

PPV = Sensitivity x Prevalence x Total over Sensitivity x Prevalence x Total + Total x (1 - Prevalence) x (1 - Specificity)

Cancel out Total and rearrange

Prevalence = PPV x (1 - Specificity) over Sensitivity x (1 - PPV) + PPV x (1 - Specificity)

To calculate prevalence using the NPV

Substitute for TN and FN in equation 2

NPV = Specificity x Total x (1 - Prevalence) over Sensitivity x Total x (1 - Prevalence) x Total x (1 - Specificity)

Cancel out total and rearrange

Prevalence = Specificity x (1 - NPV) over NPV x (1 - Sensitivity) + Specificity x (1 - NPV)

Dr Kathy Taylor teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.

Follow updates on this blog and related news on Twitter @dataextips

 

Leave a Reply

Your email address will not be published. Required fields are marked *

* Checkbox GDPR is required

*

I agree