The Receiver Operating Characteristic (ROC) Curve
The ROC curve is a plot of sensitivity vs. false positive rate (1-specificity), for a range of diagnostic test results. It graphically represents the compromise between sensitivity and specificity in tests which produce results on a numerical scale. It therefore allows a graphical representation of a test's accuracy, and allows for a comparison of such tests. It may be used to generate decision thresholds or “cut off” values, and it can be used to generate confidence intervals for sensitivity and specificity and likelihood ratios.
This came up in Primary Question 14 from the second paper of 2012 and Fellowship Question 24 from the second paper of 2009. The college answer to Question 14 is highly informative with regards to what is expected, and even offers a textbook reference (Myles and Gin pages 98 to 99). That textbook reference is actually useful, which is also rare. One may learn enough about the ROC from the one-page summary in Myles and Gin, and possibly even pass the SAQs. Beyond the textbook, there are papers like Zweig et al (1993) and Søreide (2009) which offer a more detailed analysis. Beyond even that lie such excellent resources as The magnificent ROC from www.anaesthetist.com, which expands on the topic so comprehensively as to be almost dangerous for the time-poor exam candidate.
Receiver operating characteristic curve (ROC curve)
In point form:
- The ROC curve is a plot of sensitivity vs. false positive rate (1-specificity)
- Sensitivity is on the y-axis, from 0% to 100%
- The ROC is for tests which produce results on a numerical scale, rather than binary (positive vs. negative results)
- The ROC curve can be used to determine the cut off point at which the sensitivity and specificity are optimal.
- All possible combinations of sensitivity and specificity that can be achieved by changing the test's cutoff value can be summarised using a single parameter, the area under the ROC curve (AUC).
- The higher the AUC, the more accurate the test
- An AUC of 1.0 means the test is 100% accurate (i.e. the curve is square)
- An AUC of 0.5 (50%) means the ROC curve is a a straight diagonal line, which represents the "ideal bad test", one which is only ever accurate by pure chance.
- When comparing two tests, the more accurate test is the one with an ROC curve further to the top left corner of the graph, with a higher AUC.
- The best cutoff point for a test (which separates positive from negative values) is the point on the ROC curve which is closest to the top left corner of the graph.
- The cutoff values can be selected according to whether one wants more sensitivity or more specificity.
Advantages of the ROC curves:
- A simple graphical representation of the diagnostic accuracy of a test: the closer the apex of the curve toward the upper left corner, the greater the discriminatory ability of the test.
- Allows a simple graphical comparison between diagnostic tests
- Allows a simple method of determining the optimal cutoff values, based on what the practitioner thinks is a clinically appropriate (and diagnostically valuable) trade-off between sensitivity and false positive rate.
- Also, allows a more complex (and more exact) measure of the accuracy of a test, which is the AUC
- The AUC in turn can be used as a simple numeric rating of diagnostic test accuracy, which simplifies comparison between diagnostic tests.
- The AUC is non-parametric, which means it is unaffected by abnormal distributions in the population
Disadvantages of the ROC curves:
- Actual decision thresholds are usually not displayed in the plot
- As the sample size decreases, the plot becomes more jagged
- Calculation is cumbersome without specialised software
- Friendly flexible software is not widely available
Anatomy of the ROC curve
"Receiver operating characteristics" was originally a term used to describe the ability of a radar technician to discriminate between radar blips potentially representing Japanese aircraft, friendlies or random noise. In other words, it was a personal Characteristic which decribed how good they were at Operating the radar Receiver. The abilities of the operator were also affected by the gain of the receiver, which increased the amoint of noise. Lee B. Lusted (1984) describes the origin of the medical uses of this test in his editorial for MDM. The first instance of such use occurred in the late 1950s when a newly developed Pap smear cell analyser was calibrated to operate at the optimal balance of false positives and false negatives.
The ROC curve is a plot of sensitivity vs. 1-specificity (or false positive rate). It is a graph of all possible calculated sensitivity and (1-specificity) data points.
Let us consider a test for a disease.
There are two populations, of whom some have the disease (red) and some who do not (blue). The test occasionally mis-identifies the patients, and so there are a few false positives and false negatives.
Consider some situation where the positive and negative populations overlap more and more.
The more overlap between the patient distributions, the closer to the diagonal line of uselessness you get. In some sort of perfect population where there is absolutely no overlap, the curve would pass through the top left corner of the graph, i.e. it would be perfectly square.