Question: Should smartphone apps be used clinically as oximeters? Answer: No.
April 1, 2020
Professor Lionel Tarassenko1
Professor Trisha Greenhalgh2
On behalf of the Oxford COVID-19 Evidence Service Team
1Department of Engineering Science, University of Oxford
2Nuffield Department of Primary Care Health Sciences
University of Oxford
Correspondence to firstname.lastname@example.org
There is no evidence that any smartphone technology is accurate for the measurement of blood oxygen saturation for clinical use. Furthermore, the scientific basis of such technologies is questionable. Oxygen saturation levels obtained from such technologies should not be trusted in the clinical assessment of patients.
This review was updated on 19th May to take account of feedback received from academic colleagues. Whilst the feedback resulted in some minor changes to the text below, it does not change the conclusions. The feedback and our detailed response to it can be viewed here.
The COVID crisis is requiring us to manage patients with as little in-person contact as possible. The assessment of a patient with respiratory problems usually includes measurement of blood oxygen saturation (abbreviated SpO2), using a validated pulse oximeter. This is particularly important in unwell patients with COVID-19, since hypoxia is a serious warning sign for severe pneumonia.1 Whilst in-person assessment would use a standard pulse oximeter on the patient’s finger, few patients have such a technology in their homes. Various technology companies have developed smartphone apps that are marketed as accurate for measuring oxygen saturation.
A previous rapid review by our team on assessing shortness of breath in remote consultations2 turned up two academic papers which claimed to have validated smartphone technologies for measuring oxygen levels in the blood.3 4 Both papers (summarised in a table in the appendix), described comparison against a reference method (finger pulse oximetry or arterial blood gas). Whilst those papers included claims that there was good correlation between the smartphone reading and the reference standard, we were concerned about the risks of relying on these two small studies.2 We sought expert input from a Professor of Electrical Engineering (LT) who specialises in medical devices. This paper summarises Professor Tarassenko’s advice.
THE SCIENTIFIC BASIS OF OXYGEN SATURATION MEASUREMENT
Oxygen saturation is the fraction of oxygenated haemoglobin relative to the total haemoglobin (oxygenated + deoxygenated) in the blood.
SpO2 = [HbO2] / ([Hb] + [HbO2])
where HbO2 is oxygenated haemoglobin and Hb is deoxygenated haemoglobin.
The measurement of SpO2 with a pulse oximeter relies on the fact that the two forms of the haemoglobin molecule, Hb and HbO2, have light absorption properties which vary with wavelength in the visible and infra-red parts of the spectrum. It therefore requires the measurement of light transmission or reflection from a body segment such as a finger at two different wavelengths (usually in the red and infra-red).
CRITICAL APPRAISAL OF THE DIGIDOC APP TESTED BY TOMLINSON ET AL3
Despite the principles set out above, the DigiDoc app used in the Tomlinson study claims to measure oxygen saturation levels with just the flash light and camera of a smartphone. The app was reviewed in a blog on the Medpage Today website in 2015 (https://www.medpagetoday.com/blogs/iltifathusain/51888) under the headline “There are apps making unjustifiable claims, exposing patients to unnecessary risk”. The author concludes his review with the following words: “I’d urge DigiDoc to either take the app off the market until the company can support its claims or at least make significant changes to the app store description. In the meantime, clinicians […] should advise their patients not to use it.”
The claims made by DigiDoc are scientifically unsound. The app “measures oxygen saturation within 90-100% with an accuracy of 0-4 RSM compared to a medical grade oximeter”. It is not clear what RSM is (is it root root-mean-square error, RMSE?) but if we assume that they are claiming an error of ±4%, then a random number generator with a mean value of 95% and errors randomly distributed between -4% and +4% would give values between 91% and 99%.
Careful analysis of the paper by Tomlinson et al3 confirms the lack of scientific credibility for the DigiDoc app. Figure 2 in the paper (Bland-Altman plot for the SpO2 values for the DigiDoc app and the triage pulse oximeter) shows that all the readings, bar two, were between 97% and 100%, i.e. completely normal, with 95% Limits of Agreement at -4% and +3.5%. The authors observe that “reliability is low for the camera-based app, even when the two investigators tested the same patient within 1 to 2 min of each other”. Their statement that the difference between the camera-based app and the triage pulse oximeter is ±4 points should have led them to conclude, given that the x-axis in the Bland-Altman plot only extends from 96% to 100%, that the app was highly inaccurate.
CRITICAL APPRAISAL OF 2019 SCIENTIFIC ARTICLE BY DIGIDOC AND COMPUTER SCIENCE & ENGINEERING DEPARTMENT FROM SMU, DALLAS
On 19th December 2019, DigiDoc Technologies announced in a blog that a new scientific article (Measuring Oxygen Saturation with Smartphone Cameras using Convolutional Neural Networks) had been published in the November issue of the IEEE Journal of Biomedical & Health Informatics. The three authors of this paper are researcher Xinyi Ding and Associate Professor Eric C. Larson of Southern Methodist University, Dallas and the CEO of DigiDoc Technologies, Dr Damoun Nassehi.
The methodology has a number of limitations, some of which are pointed out in the paper by the authors, but which are laid out more explicitly here:
- The paper describes the use of a 1D convolutional neural network (CNN) for regressing oxygen saturation. According to the authors, the “bias” (the output of a low-pass filter, which is a measure of the energy of the signal) of the red photoplethysmography (PPG) signal decreases when oxygen saturation decreases. This is true, but the amplitude of the decrease will of course be different when using the app on a finger from a different individual with different skin type and different capillarity.
- The CNN is trained on 38 subjects and tested on just one. This is repeated 38 times, with a different test subject in each case (leave-one-out methodology). The test dataset when training a neural network should have at least 20% of the subjects, ideally 40% or 50%. Only then can one have confidence that the neural network will give accurate results on new subjects beyond those included in the original study.
- Pulse oximeters are calibrated to a range of oxygen saturations from 70% to 100%, with blood samples regularly drawn from the volunteers taking part in the calibration study, under medical supervision, used to obtain the reference data. In these calibration studies, the subjects are slowly de-saturated, under medical supervision, by breathing gas mixtures with reduced oxygen content (see Guazzi AR, Villarroel M, Jorge J, Daly J, Frise MC, Robbins PA, Tarassenko L. Non-contact measurement of oxygen saturation with an RGB camera. Biomedical optics express. 2015 Sep 1;6(9):3320-38). The breath holding procedure described in the DigiDoc paper is not a proper de-saturation procedure and produces very few SpO2 values below 85%. As a consequence, there is virtually no data in the 70%-85% range. From the histogram shown on Fig.9 of the paper, it is clear that the overwhelming majority of the SpO2 values are between 95% and 100%.
To summarise, the app cannot be used clinically because: (a) the training dataset does not appear to have included the full range of skin types (from Fitzpatrick skin type 1 to 6); (b) the training dataset covered a limited range of oxygen saturation values, mostly in the normal range from 95% to 100%, whereas pulse oximeters used clinically should cover the range from 70% to 100%; (c) there is no independent dataset on which the app has been tested.
In e-mail exchanges with Professor Tarassenko, Dr Larson makes a number of points:
- Smartphone apps should not be used for monitoring COVID-19 because they are not proven accurate at lower saturation levels.
- The statement regarding the change of skin type and capillarity is well founded.
- A five-fold cross validation approach was also used, in which 80% of the subjects are used for training and 20% for testing (at repeated intervals such that each remaining 20% of the subject pool are used for testing – i.e. repeated cross validation). The results of the five-fold analysis were significantly no different from the leave one out results. These five-fold results were removed from the journal paper in order to meet length requirements for the journal (and because they did not change the paper’s conclusion).
- The algorithm is most accurate above 90%. One reason for this is because of the breath-holding experimental setup, it is difficult to obtain training data below 85%.
- As acknowledged by Ding et al., the authors of the paper never claim that their algorithm is precise enough for clinical use.
In a further e-mail to Professor Tarassenko, Dr Nassehi states that “the [DigiDoc] Pulse Oximeter app is for use by sports users who are interested in knowing their blood oxygenation level (SpO2) and Heart Rate. The Pulse Oximeter app is NOT INTENDED FOR MEDICAL USE.”
CRITICAL APPRAISAL OF THE SAMSUNG APP TESTED BY TAYFUR ET AL4
The Samsung Galaxy series of phones had a red light emitting diode (LED) built into the phone in addition to the flash light and camera. There were no details released by the company of how its app used the LED to estimate oxygen saturation, but it appears from publicity material on YouTube that it worked via a single-wavelength measurement (albeit with a monochromatic light source, the LED) and therefore that oxygen saturation could not be accurately derived from it.
The methodology used by Tayfur and Afacan in their study is sounder than in the Tomlinson study because their reference device is an arterial blood gas (ABG) analyser. The Bland-Altman plot (Figure 4) in their paper shows that most of the oxygen saturation measurements from their Emergency Department patients were between 95% and 100%. For the few patients whose oxygen saturation measurements were between 85% and 93%, the difference between the smartphone estimate and the ABG device varied between -5.5% and +2.5%. In other words, the readings become less accurate as the patient becomes more hypoxic.
- It is not physically possible to measure SpO2 using current smartphone technology.
- The two published studies which assessed smartphone oximeter apps (Digidoc and Samsung) raise serious questions about the diagnostic accuracy.
Disclaimer: the article has not been peer-reviewed; it should not replace individual clinical judgement and the sources cited should be checked. The views expressed in this commentary represent the views of the authors and not necessarily those of the host institution, the NHS, the NIHR, or the Department of Health and social Care. The views are not a substitute for professional medical advice.
Professor Tarassenko is Professor of Electrical Engineering at the University of Oxford.
Trish Greenhalgh is a Professor of Primary Care Health Sciences, co-Director of the Interdisciplinary Research In Health Sciences (IRIHS) unit, and joint module coordinator on the Knowledge Into Action (KIA) module of the MSc in Evidence Based Health Care.
SEARCH (for further details see2)
We searched EMBASE and PubMed
Embase: We used the Thesaurus search builder with the terms “dyspnea” OR “hypoxia” OR “oximetry” AND “telemedicine” OR “smartphone” OR “telephone”
Pubmed: We performed the following search string using the relevant MeSH terms:
(“Oximetry”[Mesh] OR “Blood Gas Monitoring, Transcutaneous”[Mesh] OR “Dyspnea”[Mesh] OR “hypoxia”[Mesh]) AND (“Telemedicine”[Mesh] OR “Remote Consultation”[Mesh] OR “Smartphone”[Mesh] OR “Telephone”[Mesh])
APPENDIX: DETAILS OF INCLUDED STUDIES
(acknowledging Koot Kotze and Helene-Mari Van Der Westhuizen2)
|Author, Title, Journal, Year, Country
||Aims/objectives, Study type
||Sampling/recruitment method, No of participants
||Data collection method
|Sarah Tomlinson, Sydney Behrmann, James Cranford, Marisa Louie, and Andrew Hashikawa. Accuracy of Smartphone-Based Pulse Oximetry Compared with Hospital-Grade Pulse Oximetry in Healthy Children. Telemedicine and e-Health. Jul 2018.527-535, United States of America
||Validation study that compared DigiDoc “a camera-based app (CBA), which utilizes the phone’s own camera lens and flash with no additional device required” and “a probe-based app (PBA), which is an app designed to use an external probe that connects directly to the smartphone. “
||“81 children ages 2–13 years without a respiratory complaint and a triage SpO2 ‡97% seen in a pediatric Emergency Department.”
Children were excluded if they were in the ED for a respiratory-related complaint, if they had underlying cardiac, respiratory, hematologic, or metabolic disease, if they were a trauma patient, if capillary refill time in fingers was >3 s, or if they had nail polish on their fingernails.
|Two investigators obtained heart rate and SpO2 using each app. Inter-rater reliability was tested using interclass correlations (ICCs), and Bland– Altman method was used to compare app values to triage measurements.
||The Probe Based App (PBA) was equivalent to standard pulse oximetry (in non-hypoxic children), but the Camera Based App (CBA) that uses the phone’s camera and flash was unreliable.
“ICC for SpO2 for PBA and CBA were 0.73 and -0.24, respectively. The 95% limits of agreement between the PBA SpO2 and triage SpO2 were -2.8 to +2.5 compared with -4.1 to +3.5 for the CBA SpO2 and triage SpO2. Mean differences between triage SpO2 and the PBA SpO2 (-0.17%) and triage SpO2 and CBA SpO2 (-0.33%) were not statistically significant. “
|Author,Title, Journal, Year, Country
||Aims/objectives, Study type
||Sampling/recruitment method, No of participants
||Data collection method
|Tayfur I, Afacan MA. Reliability of smartphone measurements of vital parameters: A prospective study using a reference method. Am J Emerg Med 2019;37:1527–30. Turkey
||This was a study “aimed to evaluate the accuracy of HR and SaO2 data obtained using a smartphone compared with the measurements of a vital signs monitor (VSM) and an arterial blood gas (ABG) device, respectively.”
||Convenience sample of 114 patients presenting to an emergency unit in Istanbul. 13 results were excluded due to technical reasons. ”The data of a total of 101 patients, 48 male (47.5%) and 53 female (52.5%), were analyzed.”
“The mean age of the male and female patients was 68.08 and 72 years, respectively. According to the age distribution of the patients, the highest number of patients were in the 60–69 years group (25.75%, n=26).” 42% had pulmonary disease.
It is not noted how many patients were excluded according to pre-defined exclusion criteria, which were: “patients aged under 18 years, those that did not agree or give consent to participate in the study, those requiring urgent intervention (blue code, unstable patients), those not able to adapt to the measurements with a device (unconscious, confused, etc.), those with a high degree of hypothermia that might adversely affect the measurement from the skin, and those wearing nail polish or false nails.
|“This study investigated the SaO2 and HR measurement reliability and efficacy of a Samsung Galaxy S8 (SM- G950F) smartphone… and the VSM (Welch Allyn, Connex Spot Monitor 71 WT) equipped with a Nellcor probe, and an ABG device (Radiometer ABL800, 754R0428N007), both available in the emergency service. The HR data measured by the smartphone were compared with the HR values obtained from the same VSM simultaneously. The triage nurse/paramedic measured the HR and SaO2 values using VSM and noted them in the study form. The smartphone measurements were undertaken by a second emergency service nurse blinded to the HR and SaO2 values determined by VSM and recorded in another form. The real-time ABG analysis was performed by doctors working in the emergency room on the same day and the results were noted in the ABG section of the study form.”
||A Bland-Altman analysis of the results comparing VSM to the smartphone for heart rate and oxygen saturation found a
Smartphone SaO2 vs Arterial blood gas SaO2:
−0.67% (95%CI=−0.845 to−0.494).
0.968 for smartphone SaO2 – ABG SaO2 (95% CI = 0.952 to 0.978). The VSM and Smartphone SaO2 mean difference and correlation coefficient are not reported, and this is problematic as it is the most clinically useful parameter, given that this is the likely application of the study – replacing one form of non-invasive oximetry with another.
- Greenhalgh T, Koh GCH, Car J. Covid-19: a remote assessment in primary care. Bmj 2020;368:m1182. doi: 10.1136/bmj.m1182 [published Online First: 2020/03/28]
- Greenhalgh T, Kotze K, Van Der Westhuizen H-M. Are there any evidence-based ways of assessing dyspnoea (breathlessness) by telephone or video? : Oxford COVID-19 Evidence Service Rapid Review. 30th March 2020. Accessed 1.4.20 at https://www.cebm.net/covid-19/are-there-any-evidence-based-ways-of-assessing-dyspnoea-breathlessness-by-telephone-or-video/.
- Tomlinson S, Behrmann S, Cranford J, et al. Accuracy of smartphone-based pulse oximetry compared with hospital-grade pulse oximetry in healthy children. Telemedicine and e-Health 2018;24(7):527-35.
- Tayfur İ, Afacan MA. Reliability of smartphone measurements of vital parameters: A prospective study using a reference method. The American journal of emergency medicine 2019;37(8):1527-30.