Bayes' theorem

Bayes' theorem is a result in probability theory, named after the Reverend Thomas Bayes, who proved a special case of it in the 18th century. It is used in statistical inference to update estimates of the probability that different hypotheses are true, based on observations and a knowledge of how likely those observations are, given each hypothesis. Its discrete version may appear to go little beyond an identity that is sometimes taken to be the definition of conditional probability, but there is also a continuous version. A frequent error is to think that reliance on Bayes' theorem is the essence of Bayesianism, whose essence is actually the degree-of-belief interpretation of probability, contrasted with various "frequency" interpretations.

Bayes' theorem in probability theory

In probability theory, Bayes' theorem is a statement about conditional probabilities that allows the exchange of the order of the events. If A and B are two events, Bayes' theorem allows us to calculate the probability of A given B if we know the probability of B given A and the probabilities of each event alone. It is a simple consequence of the definition of conditional probability, and while it says nothing 'new', it is a particularly useful theorem. Bayes' theorem states

P(A|B)={\frac {P(B|A)P(A)}{P(B)}}\,.

This follows by combining the definitions of the conditional probabilities P(A|B) = P(A∩B)/P(B) and P(B|A) = P(A∩B)/P(A). Bayes' theorem is often embellished using the law of total probability:

P(A|B)={\frac {P(B|A)P(A)}{P(B|A)P(A)+P(B|A^{C})P(A^{C})}}\,,

where A^C is the complementary event of A. More generally, where {A_i} forms a partition of the event space,

P(A_{i}|B)={\frac {P(B|A_{i})P(A_{i})}{\sum _{j}P(B|A_{j})P(A_{j})}}\,,

for any A_i in the partition.

An example: False positives

False positives are a problem in any kind of test: no test is perfect, and sometimes the test will incorrectly report a positive result. For example, if a test for a particular disease is performed on a patient, then there is a chance (usually small) that the test will return a postive result even if the patient does not have the disease. The problem lies, however, not just in the chance of a false positive prior to testing, but determining the chance that a positive result is in fact a false positive. As we will demonstrate, using Bayes' theorem, if a condition is rare, then the majority of positive results may be false positives, even if the test for that condition is (otherwise) reasonably accurate.

Suppose that a test for a particular disease has a very high success rate:

if a tested patient has the disease, the test accurately reports this, a 'positive', 99% of the time (or, with probability 0.99), and
if a tested patient does not have the disease, the test accurately reports that, a 'negative', 95% of the time (i.e. with probability 0.95).

Suppose also, however, that only 0.1% of the population have that disease (i.e. with probability 0.001). We now have all the information required to use Bayes' theorem to calculate the probability that, given the test was positive, that it is a false positive.

Let A be the event that the patient has the disease, and B be the event that the test returns a positive result. Then, using the second form of Bayes' theorem (above), the probability of a true positive is

{\begin{matrix}P(A|B)&=&{\frac {0.99\times 0.001}{0.99\times 0.001+0.05\times 0.999}}\,,\\~\\&\approx &0.019\,.\end{matrix}}

and hence the probability of a false positive is about (1 − 0.019) = 0.981.

Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the disease. (Nonetheless, this is 20 times the proportion before we knew the outcome of the test! The test is not useless, and re-testing may improve the reliability of the result.) In this case, Bayes' theorem helps show that the accuracy of tests for rare conditions must be very high in order to produce reliable results from a single test, due to the posibility of false positives.

Bayes' theorem in Bayesian inference

Justification for Bayes' theorem

We will start with the simplest case of only two hypotheses, H₁ and H₂. Suppose that we know that precisely one of the two hypotheses must be true, and suppose furthermore that we know their "prior" probabilities P(H₁) and P(H₂) = 1 - P(H₁). Now some "data" D is observed, and we know the conditional probabilities of D given H₁ and H₂, written as P(D | H₁) and P(D | H₂). We want to compute the "posterior" probabilities of H₁ and H₂, given the observation of D. Bayes' theorem states that these probabilities can be computed as

P(H_{1}|D)=c\cdot P(H_{1})\cdot P(D|H_{1})

P(H_{2}|D)=c\cdot P(H_{2})\cdot P(D|H_{2})

where the constant c has to be chosen so that the sum of the two probabilities is 1, i.e.

c={\frac {1}{P(H_{1})\cdot P(D|H_{1})+P(H_{2})\cdot P(D|H_{2})}}

This theorem is a simple consequence of the definition of conditional probabilities.

A worked example

To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Somebody randomly picks a bowl, and then randomly picks a cookie. The cookie turns out to be a plain one. How likely is it that he picked it out of bowl #1?

Intuitively, it seems clear that the answer should be more than 50%, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. H₁ corresponds to bowl #1, and H₂ to bowl #2. Since the bowl was picked randomly, we know P(H₁) = P(H₂) = 50%. The "data" D consists in the observation of a plain cookie. From the contents of the bowls, we know that P(D | H₁) = 75% and P(D | H₂) = 50%. Bayes' formula then yields

{\begin{matrix}P(H_{1}|D)&=&{\frac {P(H_{1})\cdot P(D|H_{1})}{P(H_{1})\cdot P(D|H_{1})+P(H_{2})\cdot P(D|H_{2})}}\\\\\ &=&{\frac {50\%\cdot 75\%}{50\%\cdot 75\%+50\%\cdot 50\%}}\\\\\ &=&60\%\end{matrix}}

Initially, we estimated that he would pick bowl #1 with 50% probability, but after observing the plain cookie, we adjust our estimate to 60%.

The theorem is also true if we have more than just two hypotheses, say H₁, H₂, H₃, ..., of which precisely one is true. Suppose we know the prior probability distribution

(P(H₁), P(H₂), P(H₃), ...)

as well as the likelihood function

(P(D|H₁), P(D|H₂), P(D|H₃), ...)

Then the posterior probability distribution

(P(H₁ |D), P(H₂ |D), P(H₃ |D), ...)

can be found by multiplying the prior probability distribution by the likelihood function and then normalizing, so that we have

(P(H₁|D), P(H₂|D), P(H₃|D), ...)

= c × ((P(H₁) P(D|H₁), P(H₂) P(D|H₂), P(H₃) P(D|H₃), ...).

Here again the constant c must be so chosen as to make the sum of the posterior probabilities equal to 1.

The continuous case of Bayes' theorem also says the posterior distribution results from multiplying the prior by the likelihood and then normalizing. The prior and posterior distributions are usually identified with their probability density functions.

For example, suppose the proportion of voters who will vote "yes" is an unknown number p between 0 and 1. A sample of n voters is drawn randomly from the population, and it is observed that x of those n voters will vote "yes". The likelihood function is then

L(p) = [constant] p^x (1 − p)^n−x.

Multiplying that by the prior probability density funtion of p and then normalizing gives the posterior probability distribution of p, and thus updates probabilities in light of the new data given by the opinion poll. Thus if the prior probability distribution of p is uniform on the interval [0,1], then the posterior probability distribution would have a density of the form

f(p|x) = [constant] p^x (1 − p)^n−x

and this "constant" would be different from the one that appears in the likelihood function.

Bayesianism

Bayesianism is the philosophical tenet that the rules of mathematical probability apply not only when probabilities are relative frequencies assigned to random events, but also when they are degrees of belief assigned to uncertain propositions. Updating these degrees of belief in light of new evidence almost invariably involves application of Bayes' theorem.