Your friend has tested positive for a very rare and deadly cancer. The doctor claims that the test is very accurate, and suggested that your friend should get her affairs in order. You do some research and learn that the test has 99.9% specificity (which corresponds to a false positive rate of 0.1%) and 100% sensitivity (false negative rate of 0%). Further, you learn that this cancer affects only 1 in 10,000 people and your friend has no known risk factors. What is the probability that your friend has this cancer?
The diagnosis sounds grim, right? However, the situation is far more encouraging than it sounds at first glance. There are two key factors here. First, is the rarity of the disease. Second, is the accuracy of the test and making sure we apply that information correctly.
This is a subtle but extremely important distinction. You want to know the probability of having the disease, given a positive test… not the probability of a positive test, given that you do (or don’t) have the disease.
Before walking through the analysis, here is the bottom line up front: Given the information provided, it turns out that your friend has about a 9% chance of having this cancer.
Let’s look more closely at the data. It is helpful to consider the sample space, or the set of all possible outcomes, to help us think through the analysis clearly. We’ll build a truth table to keep track of the numbers.
Let’s assume we have a population of 1,000,000 people. The number of people who actually have this cancer will be 100 (since this cancer affects 1 in 10,000). That means the remaining 999,900 people in our population do not have this cancer.
Of the 100 people who do have it, 100% will test positive (100% sensitivity) so we have 100 true positives. Of the 999,900 people who do NOT have it, 99.9% will test negative (99.9% specificity) and 0.1% will test positive. So we have 998,900 true negatives and 1,000 false positives.
We can prune the sample space to include only those who test positive. That includes the 100 true positives and the 1,000 false positives for a total of 1,100. What percentage of those people who test positive, actually have the cancer? That’s the true positives divided by the total positives, or 100 / 1,100, or 9.1%
Thus, the probability that someone has this cancer, given that they tested positive without any additional risk factors, is 100/1,100 or 9.1%!
Another way to approach this problem is to use Bayes Theorem, which is designed for conditional probability problems like this one.
Bayes Theorem can be written as follows:
Using the nomenclature from above we have:
The sensitivity of this test is 100%. The probability that someone in the population is condition-positive is 1 in 10,000. We can get the probability that someone in the population will test positive from the truth table above. We have 1,100 testing positive out of 1,000,000.
Hence,
Again, we find that the probability that someone has this cancer, given that they tested positive is 9.1%.
Your friend has tested positive for a very rare and deadly cancer. The doctor claims that the test is very accurate, and suggested that your friend should get her affairs in order. You do some research and learn that the test has 99.9% specificity (which corresponds to a false positive rate of 0.1%) and 100% sensitivity (false negative rate of 0%). Further, you learn that this cancer affects only 1 in 10,000 people and your friend has no known risk factors. What is the probability that your friend has this cancer?
The diagnosis sounds grim, right? However, the situation is far more encouraging than it sounds at first glance. There are two key factors here. First, is the rarity of the disease. Second, is the accuracy of the test and making sure we apply that information correctly.
This is a subtle but extremely important distinction. You want to know the probability of having the disease, given a positive test… not the probability of a positive test, given that you do (or don’t) have the disease.
Before walking through the analysis, here is the bottom line up front: Given the information provided, it turns out that your friend has about a 9% chance of having this cancer.
Let’s look more closely at the data. It is helpful to consider the sample space, or the set of all possible outcomes, to help us think through the analysis clearly. We’ll build a truth table to keep track of the numbers.
Let’s assume we have a population of 1,000,000 people. The number of people who actually have this cancer will be 100 (since this cancer affects 1 in 10,000). That means the remaining 999,900 people in our population do not have this cancer.
Of the 100 people who do have it, 100% will test positive (100% sensitivity) so we have 100 true positives. Of the 999,900 people who do NOT have it, 99.9% will test negative (99.9% specificity) and 0.1% will test positive. So we have 998,900 true negatives and 1,000 false positives.
We can prune the sample space to include only those who test positive. That includes the 100 true positives and the 1,000 false positives for a total of 1,100. What percentage of those people who test positive, actually have the cancer? That’s the true positives divided by the total positives, or 100 / 1,100, or 9.1%
Thus, the probability that someone has this cancer, given that they tested positive without any additional risk factors, is 100/1,100 or 9.1%!
Another way to approach this problem is to use Bayes Theorem, which is designed for conditional probability problems like this one.
Bayes Theorem can be written as follows:
Using the nomenclature from above we have:
The sensitivity of this test is 100%. The probability that someone in the population is condition-positive is 1 in 10,000. We can get the probability that someone in the population will test positive from the truth table above. We have 1,100 testing positive out of 1,000,000.
Hence,
Again, we find that the probability that someone has this cancer, given that they tested positive is 9.1%.