Fun With Math

Cancer Diagnosis - True or False Alarm?

Ray Bamford

Your friend has tested positive for a very rare and deadly cancer. The doctor claims that the test is very accurate, and suggested that your friend should get her affairs in order. You do some research and learn that the test has 99.9% specificity (which corresponds to a false positive rate of 0.1%) and 100% sensitivity (false negative rate of 0%). Further, you learn that this cancer affects only 1 in 10,000 people and your friend has no known risk factors. What is the probability that your friend has this cancer?

The diagnosis sounds grim, right? However, the situation is far more encouraging than it sounds at first glance. There are two key factors here. First, is the rarity of the disease. Second, is the accuracy of the test and making sure we apply that information correctly.

  • The specificity defines the probability that a person will test negative if they DO NOT have the disease (aka the “true negative” rate).
  • The sensitivity defines the probability that a person will test positive if they DO have the disease (aka the “true positive” rate).
  • However, you want to know the probability that a person has the disease if they tested positive.
This is a subtle but extremely important distinction. You want to know the probability of having the disease, given a positive test… not the probability of a positive test, given that you do (or don’t) have the disease.

Before walking through the analysis, here is the bottom line up front: Given the information provided, it turns out that your friend has about a 9% chance of having this cancer.

Approach 1: Evaluate the Sample Space

Let’s look more closely at the data. It is helpful to consider the sample space, or the set of all possible outcomes, to help us think through the analysis clearly. We’ll build a truth table to keep track of the numbers.

Let’s assume we have a population of 1,000,000 people. The number of people who actually have this cancer will be 100 (since this cancer affects 1 in 10,000). That means the remaining 999,900 people in our population do not have this cancer.  

Of the 100 people who do have it, 100% will test positive (100% sensitivity) so we have 100 true positives. Of the 999,900 people who do NOT have it, 99.9% will test negative (99.9% specificity) and 0.1% will test positive. So we have 998,900 true negatives and 1,000 false positives.

We can prune the sample space to include only those who test positive. That includes the 100 true positives and the 1,000 false positives for a total of 1,100. What percentage of those people who test positive, actually have the cancer? That’s the true positives divided by the total positives, or 100 / 1,100, or 9.1%

Thus, the probability that someone has this cancer, given that they tested positive without any additional risk factors, is 100/1,100 or 9.1%!

Approach 2: Conditional Probabilities and Bayes Theorem

Another way to approach this problem is to use Bayes Theorem, which is designed for conditional probability problems like this one.

Bayes Theorem can be written as follows:

Using the nomenclature from above we have:

The sensitivity of this test is 100%. The probability that someone in the population is condition-positive is 1 in 10,000. We can get the probability that someone in the population will test positive from the truth table above. We have 1,100 testing positive out of 1,000,000.

Hence,

Again, we find that the probability that someone has this cancer, given that they tested positive is 9.1%.

Takeaways

  • In order to assess a diagnostic test, it is important to know the accuracy of the test;  specifically, the test’s sensitivity and specificity (or alternatively, the false negative and false positive rates).  
  • The example above illustrates that we also need to consider the prevalence of the disease in the population. If the disease is rare, even a test with low false positive rates may not be a strong indicator that someone has the disease. If the patient has risk factors, we need to know the prevalence of the disease for someone with those risk factors.
  • Finally, we need to think clearly about the conditional probabilities. In this case, it’s not the specificity or false positive rate we care about, but rather, the probability that someone has the disease, given that they tested positive.
Fun With Math

Cancer Diagnosis - True or False Alarm?

Ray Bamford

Your friend has tested positive for a very rare and deadly cancer. The doctor claims that the test is very accurate, and suggested that your friend should get her affairs in order. You do some research and learn that the test has 99.9% specificity (which corresponds to a false positive rate of 0.1%) and 100% sensitivity (false negative rate of 0%). Further, you learn that this cancer affects only 1 in 10,000 people and your friend has no known risk factors. What is the probability that your friend has this cancer?

The diagnosis sounds grim, right? However, the situation is far more encouraging than it sounds at first glance. There are two key factors here. First, is the rarity of the disease. Second, is the accuracy of the test and making sure we apply that information correctly.

  • The specificity defines the probability that a person will test negative if they DO NOT have the disease (aka the “true negative” rate).
  • The sensitivity defines the probability that a person will test positive if they DO have the disease (aka the “true positive” rate).
  • However, you want to know the probability that a person has the disease if they tested positive.
This is a subtle but extremely important distinction. You want to know the probability of having the disease, given a positive test… not the probability of a positive test, given that you do (or don’t) have the disease.

Before walking through the analysis, here is the bottom line up front: Given the information provided, it turns out that your friend has about a 9% chance of having this cancer.

Approach 1: Evaluate the Sample Space

Let’s look more closely at the data. It is helpful to consider the sample space, or the set of all possible outcomes, to help us think through the analysis clearly. We’ll build a truth table to keep track of the numbers.

Let’s assume we have a population of 1,000,000 people. The number of people who actually have this cancer will be 100 (since this cancer affects 1 in 10,000). That means the remaining 999,900 people in our population do not have this cancer.  

Of the 100 people who do have it, 100% will test positive (100% sensitivity) so we have 100 true positives. Of the 999,900 people who do NOT have it, 99.9% will test negative (99.9% specificity) and 0.1% will test positive. So we have 998,900 true negatives and 1,000 false positives.

We can prune the sample space to include only those who test positive. That includes the 100 true positives and the 1,000 false positives for a total of 1,100. What percentage of those people who test positive, actually have the cancer? That’s the true positives divided by the total positives, or 100 / 1,100, or 9.1%

Thus, the probability that someone has this cancer, given that they tested positive without any additional risk factors, is 100/1,100 or 9.1%!

Approach 2: Conditional Probabilities and Bayes Theorem

Another way to approach this problem is to use Bayes Theorem, which is designed for conditional probability problems like this one.

Bayes Theorem can be written as follows:

Using the nomenclature from above we have:

The sensitivity of this test is 100%. The probability that someone in the population is condition-positive is 1 in 10,000. We can get the probability that someone in the population will test positive from the truth table above. We have 1,100 testing positive out of 1,000,000.

Hence,

Again, we find that the probability that someone has this cancer, given that they tested positive is 9.1%.

Takeaways

  • In order to assess a diagnostic test, it is important to know the accuracy of the test;  specifically, the test’s sensitivity and specificity (or alternatively, the false negative and false positive rates).  
  • The example above illustrates that we also need to consider the prevalence of the disease in the population. If the disease is rare, even a test with low false positive rates may not be a strong indicator that someone has the disease. If the patient has risk factors, we need to know the prevalence of the disease for someone with those risk factors.
  • Finally, we need to think clearly about the conditional probabilities. In this case, it’s not the specificity or false positive rate we care about, but rather, the probability that someone has the disease, given that they tested positive.
Let's connect.
Contact us today to explore how we can help you exploit your biggest opportunities or overcome your biggest challenges.
Contact Us