The Birthday Problem

In flipping through a book on American history, you come to notice that two of our 44 presidents, Warren G. Harding and James K. Polk, were born on November 2. An acquaintance with British history would reveal that, of 52 prime ministers, two pairs share the same birthday. And similarly for Australia -- of its 27 prime ministers, the 1st and the 24th were born on the same day. Something might be said for the peculiar birthday clustering of heads of state, if this wasn't also true for hockey teams (head coaches of the Montreal Canadiens, for one) or Academy Award-winning directors. These are relatively small groups of people -- 44, 52, 27 -- with at least a few members that share 1 out of 365 different birthdays. What's going on here?

Consider the following question: How large must a group of people be for there to be a 50% or greater chance that at least two of them have the same birthday? The answer, as we will see, is 23. With 57 people, we are almost guaranteed (99% probability) to find a match. These numbers are sufficiently low to challenge intuition; for this reason, this problem is also known as the Birthday paradox, although it is not a paradox in the logical sense. And it only seems paradoxical at first blush -- we can make sense of it if we consider it with care. First, let's see why this result evades our naive expectation. It has to do with the fact that we are likely unconsciously mulling a different problem: How large must a group of people be for there to be a 50% or greater chance of at least two of them to share a given birthday? Say, your birthday?

We can answer this question without much trouble. We will call the probability of finding a match \({\color{black}{p}}\), but we will consider instead the probability that there is not a match: \({\color{black}{\bar{p} = 1 - p}}\). Assuming that all birthdays in the year are equiprobable, we find \begin{equation} {\color{black}{\bar{p} = \left(\frac{364}{365}\right)^n}} \end{equation} for the probability that none of a group of \({\color{black}{n}}\) people share your birthday. With \({\color{black}{\bar{p} = p = 0.5}}\), there must be \({\color{black}{n = 253}}\) people. Now this is about large enough to seem plausible. As we noted though, this is a different problem. We are not interested in matching a specific birthday, but instead matching any birthday. This too isn't difficult, with 1
We can also find \({\color{black}{p}}\) by summing the individual probabilities that exactly 2 people, exactly 3 people, and so on, share birthdays. For \({\color{black}{k=2}}\) people, we have $$ {\color{black}{p(k=2) = \frac{1}{365^2}\frac{1}{364^{n-2}}\cdot 364\cdot 363\cdots (364-n-3) = \frac{1}{365^2}\frac{1}{364^{n-2}}\frac{364!}{(364 - n -2)!}}} $$ Summing all cases, we have $$ {\color{black}{p = \sum_{k=2}^\infty \frac{1}{365^k}\frac{1}{364^{n-k}}\frac{364!}{(364-n-k)!}}} $$ which is admittedly less wieldly than Eq. (\ref{bday}). It is nice, though, that the above infinite series sums to the closed form expression Eq. (\ref{bday}).
$$ {\color{black}{\bar{p} = \left(\frac{365}{365}\right)\left(\frac{364}{365}\right)\left(\frac{363}{365}\right)\cdots \left(\frac{365-n+1}{365}\right) = \frac{365!}{(365-n)!}\frac{1}{365^n}}} \label{bbday} $$ giving the probability that no two out of \({\color{black}{n}}\) people share a birthday. Note that this probability is zero when \({\color{black}{n > 365}}\), since then there must be a match (this eventuality is sometimes referred to as the pigeonhole principle). Then, $$ {\color{black}{p = 1 - \bar{p} = 1 - {365 \choose n}\frac{n!}{365^n}}}. \label{bday} $$ The quantity \({\color{black}{{365 \choose n}n!}}\) gives the number of ways \({\color{black}{n}}\) distinct objects can be selected from a set of 365, without replacement, when the order is relevant. Solving Eq. (\ref{bday}) with \({\color{black}{p = 0.5}}\) gives \({\color{black}{n=23}}\) people. We have succeeded in demonstrating the truth of the Birthday Problem, but how are we to interpret it?

Consider that the number of different pairs that can be formed from a group of 23 people is \({\color{black}{{23 \choose 2} = 253}}\). We've just seen this number -- it's the number of people that we need to sample in order to have a 50% chance of finding one that shares our birthday. And so it's not the number of people that matter -- it's the number of comparisons. In the first case that we analyzed, we were looking to match a given birthday. We found that 253 comparisons would be needed to furnish a 50% chance of a match, and since we are singling out one birthday, we must compare one person with 253 others. The next case, that addressed in the Birthday Problem, does not fix the birthday. We still need 253 comparisons to find a match, but since any birthday will do, these 253 comparisons are distributed among the sample group of 23 people. Each person is compared against 22 others, and there are 23 such sets of comparisons. After dividing by two to account for double-counting (it's the same whether we compare Bob to Alice or Alice to Bob), we find \({\color{black}{23\times 22/2 = {23 \choose 2} = 253}}\).

Fig. 1 (left) Topology of the first case considered, in which a specific birthday (center) is compared against 253 others (points around the circumference). (right) Topology of the case considered in the Birthday Problem, in which the 253 comparisons are distributed among 23 people (points around the circumference). In both cases 253 comparisons are made. Images created by M. Kresch using gnuplot; the script is available here.

[1] The implications of the Birthday Problem to the security of hash functions is discussed here.

Download as PDF