Two Wrongs

The Hypothesis of the Fair Coin

The Hypothesis of the Fair Coin

Alice reaches into her wallet and picks up a coin. She tells us she is about to flip it, and asks us,

What probability do you assign to an outcome of heads?

What do we answer?


50 %, of course.


But then she asks us, “Why?”


A colleague of mine at some point said something along the lines of,

  • A statistically naïve person will, after seeing substantially more heads than tails, believe tails is due and thus more likely to come up next.
  • A statistical novice will recall that the above is the gambler’s fallacy and stick to believing that heads and tails are equally likely.
  • A statistical journeyman will entertain the idea that maybe it’s time to update their belief in the coin being biased, and suggest that heads is more likely to come up next.

This does summarise the main points of the article, but I believe much of the value comes from the discussion. Let’s find out why 50 % is the right probability for the coin flip!

Variants of the fairness hypothesis

The question of “Why” is hard. It is tempting to say something like,

Well, due to the symmetry of the coin on any throw it has equal probability of landing on both sides, and with two sides that becomes a 50 % probability that it lands on either side.

We might also say something like,

Well, there are two possible outcomes, and the process that yields them is inherently unbiased and noisy, so both outcomes are equally likely on every throw, and therefore any of the two sides has a 50 % probability of being up.

Or even

I have been told that coins give an outcome of heads with 50 % probability on each flip and I have not experienced much in my life that contradicts it, so I think that will be the case for your coin also.

These are long ways of saying what a mathematician or scientist might phrase as, “I hypothesise that the coin is fair.”1 Where fair means something like equiprobable and independent outcomes.

Let’s look at what it really means to hypothesise that the coin is fair.

The long string of heads

Alice then flips the coin and it lands heads. Then she flips it again, and it lands heads again. And again. And yet one more time. When she stops, it has landed heads five out of five flips. She stops, and asks you,

What probability do you assign to an outcome of heads?

Five heads in a row is unlikely, but it does happen. You might feel a little silly, but you confidently say 50 %.

Alice flips 10 more times, and in all of them the outcome is heads. She looks at you, and the question comes again:

What probability do you assign to an outcome of heads?

This puts you in a predicament. At this point, you have seen 15 flips, all of which came out heads. Something feels wrong, but none of the evidence you have seen actually invalidates the hypothesis that the coin is fair2 Even a fair coin gets long runs of heads sometimes – if it didn’t, that would indeed be reason to doubt its fairness., so you are forced to follow your hypothesis and say 50 % again. In fact, saying anything else at this point would be falling for the reverse gambler’s fallacy.

She flips the coin 35 times again, and your fears come true: all heads again.

What probability do you assign to an outcome of heads?

You might sense what the problem is here: once we’ve assumed the coin is fair, all results are equally likely. A string of 15 heads is just as likely as thtthttttththtt, or any other specific sequence. Since all results are equally likely, all results are completely uninformative. By adopting too strict a hypothesis, we’ve locked ourselves out of learning from our experience.

The highly specific bet

In a parallel universe, Bob shows us his coin and asks us for the probability of heads. We give the same sort of reasoning, i.e. also in this universe do we hypothesise that the coin is fair. Bob flips the coin 1000 times and in 561 of them it turns out heads.

Then he says he will flip the coin 4 billion times, and asks us to bet on the frequency of heads. He proposes odds such that we are rewarded for betting on an interval which we believe will contain the frequency with 90 % probability. Since we assumed the coin was fair, we are forced – by the law of large numbers3 In this situation we should probably invoke the stronger central limit theorem and bet on an even narrower range. – to bet on the range 49.9975–50.0025 %.

But look at those 1000 trial flips that had a frequency of 56 %. Should we not widen our range at least a tiny bit, to account for the fact that the coin might actually be a little biased? Nope. Since we have hypothesised a fair coin, the 1000 trial flips cannot do anything to change our beliefs. The law of large numbers is very firm.

How can we know when we are wrong about our hypothesis, and when we were simply unlucky in the trial flips? We can’t, once that hypothesis is in place. We cannot use data that has a valid interpretation under an assumption to discredit that assumption.

Solution: Assume no specific bias

Let’s rewind to when we had just seen Alice’s coin, and said we had a 50 % probability on it landing heads. She asked us why. Here’s the real reason why:

The bias of that coin is unknown to me. Since I don’t know in which direction – if any – the coin is biased, I will have to assign equal probabilities to bias in any direction, and that means for this first toss I believe I will see heads with a 50 % probability.

We don’t hypothesise a specific coin – we hypothesise an infinite number of possible coins, with biases ranging from 0 % to 100 % and everything in between. We may attribute higher probabilities to biases near 50 %, but we do so symmetrically – hence believing in 50 % for the first toss.

Under this looser collection of hypotheses, we are free to update our beliefs in the various levels of bias just as we would under normal Bayesian updating.

Fairness may be reasonable under heterogeneity

I claimed before that sometimes it might be the right thing to continue to hold fairness even after seeing 15 heads in a row. This is under heterogeneity, and de Finetti4 Theory of Probability; de Finetti; Wiley; 1974. describes the procedure colourfully as always:

By way of contrast, we would have less reason for such suspicions and doubts [of the coin being biased] if, from time to time, or even at each toss, the coin were changed. This would be even more true if coins of different denomination were used and the person doing the tossing were replaced, and more so again if the successive events considered were completely different in kind.

For example, whether we get an even or odd number with a die, or with two dice, or in drawing a number at bingo, or for the number plate of the first car passing by, or in the decimal place of the maximum temperature recorded today, and so on; whether or not the second number is greater than the first when we consider number plates of passing by, ages of passers-by, telephone numbers of those who call us up, and so on.

Under these circumstances, it seems very unlikely that a “suspicious” outcome [such as 15 “heads” in a row], whatever it was, would lead one to expect similar strange behaviour from future events, which lack any similarity of connection with those that have gone before.