This vignette provides some R code that is related to some of the content of Chapter 4 of the STAT0002 notes, namely to Bayes’ theorem.

A screening test

A classic application of Bayes’ theorem arises in screening for a disease of condition. This screening test does not determine whether or not the person has the disease. It is used to identify individuals who have a relatively high probability of having the disease and therefore may benefit from a more definitive diagnostic test.

Notation

Consider a person selected at random from a population to take the screening test. Let

  • DD be the event that they have the disease;
  • ++ be the event that they test positive for the disease;
  • - be the event that they test negative for the disease.

Properties of the test

The mathematical properties of the test are governed by the following probabilities:

  • the sensitivity or true positive rate, the probability P(+D)P(+ \mid D) that a person who has the disease tests positive;
  • the *specificity or true negative rate**, the probability P(notD)P(- \mid \text{not}D) that a person who does not have the disease tests negative.

Use of Bayes’ theorem

What matters to a person who takes the test is their probability of having the disease given the result of their test. We use Bayes’ theorem to calculate the relevant probabilities. These probabilities depend on the pre-test, or prior, probability P(D)P(D) that the person has the disease. In the current context, where the person is selected at random from a population, P(D)P(D) is the proportion of population who have the disease.

An example: type 2 diabetes

A screening test for type 2 diabetes (hereafter referred to simply as diabetes) is based on blood glucose levels after a 12-hour period of fasting. A person tests positive for diabetes if their fasting blood glucose level is greater than 6.5 mmol/L. Among people with untreated diabetes the probability P(+D)P(+ \mid D) is 0.9330.933. Among people who do not have diabetes the probability P(+notD)=0.020P(+ \mid \text{not}D) = 0.020 is much smaller, so that the sensitivity P(notD)P(- \mid \text{not}D) of the test is 10.020=0.981 - 0.020 = 0.98.

We suppose in a population of interest, perhaps people over 50 years of age, that P(D)=0.03P(D) = 0.03, that is, 3%3\% of this population have type 2 diabetes.

If a person tests positive then what is their probability of having diabetes?

Bayes’ theorem gives

P(D+)=P(+D)P(D)P(+)=P(+D)P(D)P(+D)P(D)+P(+notD)P(notD),P(D \mid +) = \frac{P(+ \mid D) P(D)}{P(+)} = \frac{P(+ \mid D) P(D)}{P(+ \mid D) P(D) + P(+ \mid \text{not}D) P(\text{not}D)},

where we have used the law of total probability in the denominator. Substituting the values of the probabilities, noting that P(notD)=10.03P(\text{not}D) = 1 - 0.03, gives

> 0.933 * 0.03 / (0.933 * 0.03 + 0.020 * 0.97)
[1] 0.5906309

If a person tests negative then what is their probability of not having diabetes?

Bayes’ theorem gives

P(notD)=P(notD)P(notD)P()=P(notD)P(notD)P(notD)P(notD)+P(D)P(D),P(\text{not}D \mid -) = \frac{P(- \mid \text{not}D) P(\text{not}D)}{P(-)} = \frac{P(- \mid \text{not}D) P(\text{not}D)}{P(- \mid \text{not}D) P(\text{not}D) + P(- \mid D) P(D)},

where P(D)=1P(+D)=10.933=0.067P(- \mid D) = 1 - P(+ \mid D) = 1 - 0.933 = 0.067.

> 0.98 * 0.97 / (0.98 * 0.97 + 0.067 * 0.03)
[1] 0.99789

The screening_test function

The stat0002 R package has a function screening_test that performs these calculations.

> library(stat0002)
> screening_test(prior = 0.03, sensitivity = 0.933, specificity = 0.98)
Prevalence, sensitivity, specificity:
       P(D)     P(+ | D)  P(- | notD)  
      0.030        0.933        0.980  
P(positive test), positive and negative predictive values:
       P(+)     P(D | +)  P(notD | -)  
    0.04739      0.59063      0.99789