Stop the spread - Improve Temperature Check Accuracy With Measurement System Analysis
Publication: Quality Progress
Date: August 2021
Issue: Volume 54 Issue 8
Pages: pp. 32-40
Author(s): Mullenix, Paul
Organization(s): Entegris Inc., Billerica, MA
JUST THE FACTS
- Many facilities have implemented forehead temperature screening to detect possible COVID-19 fever symptoms. But if biases and measurement errors are not considered, high false negative rates of more than 90% can result.
- Implementing principles from a measurement system analysis can improve temperature screening dramatically and potentially limit infection.
- The author offers three steps for using a measurement study to improve temperature screening error rates.
Many facilities have implemented forehead (temporal) temperature screening to detect possible COVID-19 fever symptoms. Surprisingly, even following well-intentioned screening recommendations,1-3 an individual with a fever still can falsely test negative and be admitted to the facility more than 90% of the time.
Perhaps equally surprising, standard principles from a measurement system analysis (MSA) can improve this temperature screening dramatically, potentially limiting infection and thereby saving lives.
Bias and measurement error are two measurement concepts that are key to understanding how such large false negative rates can occur. These concepts also can be used to reduce false negative errors, which allow entry to feverish persons, as well as false positive errors, which deny entry to non-feverish persons.
First, apply the concept of bias, which relates a systematic difference in the measurement mean to the actual value. Standard guidance from many sources recommends using a temporal thermometer and denying entry to anyone with a reading of 100.4° F or higher.4 A recent study, however, shows that temporal readings generally are biased lower by about 0.45° F to readings taken orally.5
In this same study—and others—a fever is defined by an oral reading of 100.4° F or higher.6 Because bias exists, it is important to specify whether the measurement being taken is oral, temporal or another method, such as tympanic.7, 8
In Figure 1, this concept of bias is applied to a person with a fever measured orally at 100.4° F. On the temporal scale, the mean temperature reading is 100.4° F - 0.45° F = 99.95° F (vertical yellow line). The concept of measurement error is shown in Figure 1 as the normal distribution under the gray line for observed readings, with a mean of 99.95° F and a measurement error standard deviation (SD) 0.332° F. The measurement error SD is a measure of the variation in the measurement process when the same item is measured repeatedly.
DIAGRAM OF AFEBRILE (NON-FEVER) DISTRIBUTION AND MEASUREMENT ERROR OF AN INDIVIDUAL WITH THE MINIMUM FEVER CRITERIA
Observe in Figure 1 that if the oral criterion of 100.4° F is applied on the temporal scale (red vertical line), then 91% of the distribution of observed readings is below this red fever line. This means that the chance of a false negative for this person with an oral fever of 100.4° F is 91%!
The other distribution, under the blue line in Figure 1, is a model of afebrile (non-fever) temporal temperatures shown as approximately normal, with a mean of 97.376° F and an SD of 0.756° F, as suggested in the article, “Normal Body Temperature: Systematic Review.”9
So how can a measurement study improve temperature screening error rates? There are three steps:
- Estimate measurement error from an MSA.
- Use the measurement error SD to construct a guard band threshold with a desired false negative rate.
- Reduce the false positive rate by using a multiple test strategy.
See Figure 2 for an example of a guard band threshold calculated from the distribution of test error in which the same measurement repeatability distribution for a person with an oral fever of 100.4° F that reads 99.95° F on the temporal scale is shown with a guard band threshold placed on the lower end of the distribution. The guard band threshold was constructed with a 4% false negative probability of a reading being below this limit. A multiple test strategy can reduce the final false negative risk substantially below 4% to about 0.5%, as shown later.
AFEBRILE (NON-FEVER) DISTRIBUTION AND MEASUREMENT ERROR DISTRIBUTION WITH GUARD BAND THRESHOLD FEVER CRITERIA
Estimating measurement error
A simple MSA, as taught in a standard Six Sigma course, can provide a suitable estimate of the measurement error to calculate the guard band threshold.10 Actual measurement variation could be quite different from that claimed by the instrument manufacturer and must be estimated in practice.11
Begin by selecting a random sample of 10 presumably afebrile subjects. When using 10 randomly selected subjects, there is an 88.7% chance that at least one individual is from the lower quartile of the normal temperature distribution, at least one is from the upper quartile and at least is one is from the middle 50%, thus spanning the normal temperature range. Subjects’ temperatures should stabilize in the ambient environment, and there should be a procedure, such as a taped floor area, for fixing the subject’s position relative to the measurement instrument.
Next, take five repeated readings for each subject, with the subject stepping away and repositioning for each repeated measurement. To avoid the subject’s temperature changing appreciably, there shouldn’t be too much time between readings. However, you do want to incorporate positional variation that is experienced when subsequent subjects are tested. Sample data are shown in Table 1, along with a simple analysis to obtain the repeatability SD estimate of 0.332 °F. Note that the row variance is the square of the usual SD of the five measurements in that row.
SAMPLE DATA TO ASSESS MEASUREMENT PROCESS STANDARD DEVIATION (°F)
Considerable effort should be made to standardize the measurement process, document the standard measurement procedure, train operators and minimize as many sources of variation as possible. Factors affecting temporal measurement screening have been studied by many. Conclusions based on those studies include:12, 13
- Ambient temperature should be held constant.
- The procedure should be standardized, such as the position of the subject and the subject’s distance from the instrument.
- Instrument operators should be trained.
- The subject should not undergo strenuous activity before being measured or be using fever-reducing medication.
- The instrument should be calibrated.
- Other factors such as age, gender and time of day also can affect readings.
During an actual implementation, the ambient temperature at one location was checked on the same 10 subjects, measured five times each, at 78° F and 92° F, resulting in a temperature reading increase of 1.26° F at the higher ambient temperature. Thus, adequate controls for ambient temperature are critical.
If operator skill can significantly influence the temperature reading, the study from Table 1 can be repeated for multiple operators within a short time frame so that the subjects’ temperatures do not change. Statistical software can compute repeatability and reproducibility measures for instrument and operator error contributions respectively.
Setting the guard band and calculating error rates
By lowering the guard band threshold, the false negative rate can be reduced, but at the expense of increasing the false positive rate. To be practical, any guard band for false negative risk must have an acceptable false positive risk to control the risk of excluding those who do not have a fever. Figure 3 shows tree diagrams for three possible testing strategies involving a single test, a single test with a confirmation test if the first test is positive, and a three-test strategy with the outcome determined by at least two out of three tests agreeing on the same result.
KEY TO OPERATIONAL RATINGS, TBL COMPONENTS
The reason for considering multiple tests is evident in Figure 4, in which the three-test strategy has lower false positive rates than the one and two-test strategies. The false positive rates in Figure 4 were averaged over the entire afebrile distribution requiring integration (see the sidebar “Derivation of Formulas for Three-Test Strategy” for formulas). The graph in Figure 4 was computed with a guard band threshold controlling the false negative risk to 0.5%. Due to superior performance on false positive error, the three-test strategy has been adopted going forward.
FALSE POSITIVE RATE AS A FUNCTION OF MEASUREMENT ERROR FOR VARIOUS TEST STRATEGIES
DERIVATION OF FORMULAS FOR THREE-TEST STRATEGY
Let T = guard band threshold, and let C = 99.95° F be the value of an oral fever of 100.4° F expressed on the temporal scale. Suppose the instrument resolution is 10-k (k ≥ 1), the last decimal place reported by the instrument. Also, denote the area to the left of z in a standard normal distribution, with mean 0 and standard deviation (SD) 1, by the distribution function Φ(z). Then, Φ-1(p) returns the z value with area p to the left.
p(x) = P (x + ε ≥ T -) = 1 - Φ ()
be the probability of a positive result for testing the true value x (expressed on the temporal temperature scale) with respect to the guard band threshold T, in which ε is the measurement error distributed normally with mean 0 and SD σ = measurement error SD.
From Figure 3, the overall probability of a false negative result for the three-test strategy when testing the value C is 1 - 3[p(C)]2 + 2[p(C)]3. By setting this equal to β = desired false negative level, we can solve for the probability of a false negative result on a single test as 1 - p(C) = 1/2 - cos((cos-1(2β - 1) + 4π) / 3). Finally, we solve p(C) for T as
T = Round (C + σ Φ-1 (- cos )
in which Round(•, k) rounds to k decimal places. The way T is constructed, the overall false negative risk is β. The false positive risk requires the computation of the overall probability of a positive result averaged over the distribution of w = afebrile temperature readings, with normal density function f(w) having temporal mean 97.376°F and SD 0.756° F1 given by the expression
Average false positive rate = ∫[p(w)]2 [1 + 2 (1-p(w)] ƒ (w) dw.
Example: σ = 0.332; β = 0.005; 1 - p(C) = 1/2 - cos((cos-1(2(0.005) - 1) + 4 π ) / 3) = 0.0414; T = Round(99.95° F + 0.332Φ-1(0.0414),3) = 99.374; average false positive rate = 0.00563.
- Ivayla I. Geneva, Brian Cuzzo, Tasaduq Fazil and Waleed Javaid, “Normal Body Temperature: Systematic Review,” Open Forum Infectious Diseases, Vol. 6, No. 4, 2019, pp. 1-7.
As an example of how the three-test strategy controls false negative risk, the situation depicted in Figure 2 shows the probability of a false negative result to be 4.124% for a measurement error SD of 0.332. Here is where a simple probability calculation comes to the rescue. With p, the probability of a positive result in Figure 3, the probability of a false negative conclusion from the three-test strategy is (1-p)2 + 2p (1-p)2. Inserting 0.04124 for 1-p means the risk of a false negative decision is (0.04124)2 + 2(0.95876)(0.04124)2 = 0.00496.
To calculate the guard band threshold for controlling the false negative risk to 0.5%, 1% or 2%, refer to Figure 5. Locate the measurement error SD on the horizontal axis in Figure 5 and use the desired false negative risk line to read off the threshold on the vertical axis.
GUARD BAND THRESHOLD FOR A GIVEN FALSE NEGATIVE RISK AS A FUNCTION OF MEASUREMENT ERROR
Having selected the guard band limit to control the false negative risk to 0.5%, 1% or 2%, Figure 6 can be used to understand the false positive risk for a given measurement error SD. A false positive that mistakenly denies entrance to an individual with a true body temperature below the fever criteria can cause anxiety and frustration. Thus, a low false positive risk also is important. Observe from Figure 6 that if the measurement SD is below 0.4, then the false positive risk error rate is below 1%.
FALSE POSITIVE RATE AS A FUNCTION OF MEASUREMENT ERROR FOR VARIOUS FALSE NEGATIVE GUARD BANDS
Consider a situation in which a non-contact temporal temperature instrument has been evaluated with an MSA to have a measurement error SD of 0.332° F. Take the criteria for a fever as an oral temperature of 100.4° F, set a target false negative risk at 0.5% and employ a three-test strategy.
The approximate threshold value can be found in Figure 4, but the exact value is calculated as 99.374° F using the formulas in the sidebar “Derivation of Formulas for Three-Test Strategy.” This threshold provides a false negative risk of 0.496% and a false positive risk of 0.563%. In medical test terminology, the sensitivity and specificity of the test are 1 - 0.00496 = 99.504% and 1 - 0.00563 = 99.437%, respectively.
The use of temporal screening has obvious advantages in speed and the non-contact nature of testing, but it can result in false negative and false positive errors. If the concepts of bias and measurement error are not considered, high false negative rates of more than 90% can result. Incorporating the concepts of bias and measurement error, here are five recommended steps for temporal temperature testing:
- Standardize the measurement process and minimize as many sources of variation as possible (for example, control ambient temperature and standardize positioning of subjects).
- Conduct an MSA by repeatedly measuring 10 people’s temperature five times each. Calculate the measurement error SD, as in Table 1.
- From the graph in Figure 5, use the measurement error SD to determine the guard band threshold to control the false negative risk to, say, 0.5%.
- Examine Figure 6 to read off the false positive risk as a function of the measurement error SD.
- Implement the three-test strategy from Figure 3 and exclude any individual from entry who has at least two out of three temperature readings at or above the guard band threshold.
The methods here can be extended easily to handle different test instruments (such as oral or tympanic instruments for confirmation testing) at each stage of a multiple-test strategy, along with incorporating instrument resolution.
Download the Excel computational tool, “Temp Screening Threshold Calculation” (the second "Download" button at the top of this webpage) to easily compute all performance measures and threshold limit values discussed in this article.
The simple ideas from MSA presented here can be useful for improving decision making for temperature screening against COVID-19 and could save lives in the process. QP
Want to learn more about measurement system analysis (MSA)? ASQ has training courses that can help. “Measurement System Analysis for Beginners: Know Measurement Error” teaches the fundamentals of MSA, including linearity, bias and stability studies. In the course “Measurement System Analysis,” you’ll review practical, real-world scenarios to see how project teams use MSA and gage repeatability and reproducibility to support process improvement initiatives. Find out more at asq.org/training.
The author thanks Girish Madavan, Athmanathan Nadarajah and Ku Elianie Syarmy Ku Kilmy for the data used in this article and the QP reviewers for their helpful comments.
- “Manufacturing Workers and Employers: Interim Guidance from CDC and the Occupational Safety and Health Administration (OSHA),” Centers for Disease Control and Prevention (CDC), May 2020, https://tinyurl.com/cdc-mfg-guide.
- “Implementation of Mitigation Strategies for Communities With Local COVID-19 Transmission,” CDC, 2020, https://www.cdc.gov/coronavirus/2019-ncov/community/community-mitigation.html.
- “Guidance for Cruise Ships: How to Report Onboard Death or Illness,” CDC, March 2017, https://tinyurl.com/cdc-cruise-guidance.
- “Implementation of Mitigation Strategies for Communities With Local COVID-19 Transmission,” see reference 2.
- Adrian D. Haimovich, R. Andrew Taylor, Harlan M. Krumholz and Arjun K. Venkatesh, “Performance of Temporal Artery Temperature Measurement in Ruling Out Fever: Implications for COVID-19 Screening,” Journal of General Internal Medicine, Vol. 35, No. 11, 2020, pp. 3,398-3,400.
- Christian Backer Mogensen, Lena Wittenhoff, Gitte Fruerhøj and Stephen Hansen, “Forehead or Ear Temperature Measurement Cannot Replace Rectal Measurements, Except for Screening Purposes,” BMC Pediatrics, Vol 18, No. 1, 2018, p. 15.
- An V. Nguyen, Nicole J. Cohen, Harvey Lipman, Clive M. Brown, Noelle-Angelique Molinari, William L. Jackson, Hannah L. Kirking, Paige Szymanowski, Todd W. Wilson, Bisan A. Salhi, Rebecca R. Roberts, David W. Stryker and Daniel B. Fishbein, “Comparison of 3 Infrared Thermal Detection Systems and Self-Report for Mass Fever Screening,” Emerging Infectious Diseases, Vol. 16, No. 11, 2010, pp. 1,710-1,717.
- Ivayla I. Geneva, Brian Cuzzo, Tasaduq Fazil and Waleed Javaid, “Normal Body Temperature: Systematic Review,” Open Forum Infectious Diseases, Vol. 6, No. 4, 2019, pp. 1-7.
- Automotive Industry Action Group (AIAG), Measurement Systems Analysis (MSA), fourth edition, AIAG, 2010.
- J. Aw, “Letters to the Editor: The Non-Contact Handheld Cutaneous Infra-Red Thermometer for Fever Screening During the COVID-19 Global Emergency,” Journal of Hospital Infection, Vol. 104, No. 4, 2020, p. 451.
- Karen Weintraub, “Are Human Body Temperatures Cooling Down?” Scientific American, Jan. 17, 2020, www.scientificamerican.com/article/are-human-body-temperatures-cooling-down.
- Daniel J. Niven, Jonathan E. Gaudet, Kevin B. Laupland, Kelly J. Mrklas, Derek J. Roberts and Henry Thomas Stelfox, “Accuracy of Peripheral Thermometers for Estimating Temperature: A Systematic Review and Meta-Analysis,” Annals of Internal Medicine, Vol. 163, No. 10, 2015, pp. 768–777.
Paul Mullenix is the director of global statistics, Six Sigma and continuous improvement at Entegris Inc. in Billerica, MA. He received a doctorate in statistics from the University of Florida in Gainesville. Mullenix is a senior member of ASQ and a published author.
Want to comment on or discuss this feature? Visit the Quality Progress discussion page of myASQ at myASQ.org.