The birthday paradox states that if there are 23 people in a room then there is a slightly more than 50:50 chance that at least two of them will have the same birthday. For 60 or more people, the probability is greater than 99%. This is not a paradox in the sense of it leading to a logical contradiction; it is a paradox in the sense that it is a mathematical truth that contradicts common intuition. Most people estimate that the chance is much lower than 50:50
Estimating the probability
Calculating this probability (and related ones) is the birthday problem. The theory behind it was described in the American Mathematical Monthly in 1938 in Zoe Emily Schnabel's The estimation of the total fish population of a lake, under the name of capture-recapture statistics.
The key to understanding the birthday paradox is to realize that there are many possible pairs of people whose birthdays could match. Specifically, among 23 people, there are 23 × 22/2 = 253 pairs, each of which being a potential candidate for a match. To emphasize the point, consider a different scenario: if you enter a room with 22 people, the chance that somebody there has the same birthday as you is not 50:50, but much lower. This is because now there are only 22 possible pairs that could yield a match.
To compute the approximate probability that in a room of n people, at least two have the same birthday, we disregard distribution variations (leap years, twins, cyclical variations), and assume that 365 possible birthdays are equally likely. The trick is to first calculate the probability that the n birthdays are different. This probability is given by
because the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), etc. Using factorial notation, this expression can also be written as
Now, 1 − p is the probability that at least two persons have the same birthday. For n = 23, one obtains a probability of about 0.507.
Note that neither of the two people is chosen in advance: by way of contrast, the probability that someone in a room of n other people has the same birthday as a particular person (for example, you) is given by
which for n = 22 gives only about 0.059, or slightly better than 1 chance in 17. For a greater than 50:50 chance that one person in a roomful of people has the same birthday as you, you would need n to be at least 253 .
The birthday paradox in its more generic sense applies to hash functions: the number of N-bit hashes that can be generated before probably getting a collision is not 2N, but rather only 2N/2. This is exploited by birthday attacks on cryptographic systems (see cryptographic hash function).
A mathematical, as opposed to numerical, view of the birthday paradox
In his autobiography, Paul Halmos wrote:
- "A good way to attack the problem is to pose it in reverse: what's the largest number of people for which the probability is less than 1/2 that they all have different birthdays? .... the problem amounts to this: find the smallest n for which
- The indicated product is dominated by
- The asserted domination comes from the celebrated relation between the geometric and arithmetic means; the next inequality comes from the definition of the definite integral, and the last one from 1 − x < e−x. .... The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories."
An (non-fatal) error in Halmos' argument (whose general idea is right)
In the argument of Halmos quoted above, the inequality
is incorrect, as one may readily check. But the argument can be fixed. In the product
the first factor is equal to 1, and may therefore be discarded. Then we have
The first inequality above is a case of the inequality of arithmetic and geometric means. The second inequality comes from 1 − x < e−x.
The value of the last expression is less than 1/2 if and only if
where "loge" means the natural logarithm. This falls just barely short of 506, and as luck would have it, 506 is one of the values of n2 − n ; it attains that value when n = 23.
Empirical test
The birthday paradox can be simulated empirically using computer code.
The following command can be used with the Ruby programming language.
ruby -e 'puts (1..30).collect {rand(365)+1}.uniq.length'
where "30" is the number of people in the group and "365" is the number of days in a year. If the output is the same as the number of people in the group, there were no repeating birthdays. If it is not the same, then there were repeating birthdays.
The following code is in Perl. It returns numbers found to be repeated on series generated. It is not unusual to get repetition for the 23 sample random values generated.
perl -e 'for (1..22) {$h{int(rand(365)+1)}++}; for (keys %h) {print $_, "\n" if $h{$_} > 1}'