What Is Probability?

kqr

, published 2023-04-14

Tags:

Objective probability is like phlogiston: it might seem real, it may even offer some explanatory power, but in the end it’s a model with limited applicability.

Probability is a subjective judgment. In this admission lies a theory of probability that is more objective than any idea of probability that sets out with the goal of being objective.

Critically, the probability of an event is not:

frequency;
long-run tendency;
based on symmetry of devices (like coins or dice); nor
objectively fixed for multiple instances of the same event.

Although these ideas can be helpful to determine the probability of an outcome, they do not define the probability in question.

This article is a short summary of the first handful of chapters of de Finetti’s classic treatise on the fundamentals of probability1¹ Theory of Probability; de Finetti; Wiley; 1974.. The book itself contains much more interesting detail, proofs, edge cases, and arguments. Read it if this is your thing! The quotes in this article are from that book, but I have taken the liberty of editing them for conciseness. I have strived very hard to retain the spirit2² Including the appropriate level of whimsy. of the message.

Events need to be well defined
- Operational definitions make for well defined events
Logic lacks nuance around uncertainty
Probability is subjective and information dependent
Evaluate probability by equivalent certain gain
Ambiguity is about which information is relevant
Combined probabilities need to be coherent
Stochastic independence presupposes logical independence
Rules of Probability
Frequency can be used as a subjective guide
Appendix A: Defining probability with adversarial betting

Events need to be well defined

For any of this discussion to be meaningful, we need to constrain ourselves to well defined events. By event, we mean something very general: something that may or may not have happened, or that will or won’t happen; a statement that is either true or false. Here are examples of events:

The flipped coin lands heads-up.
A banana is nutritious.
Bob wins the lottery.

These events are not well defined. Here are some questions we may have to answer before we can judge whether the events are true or false:

Which flipped coin?
What does nutritious mean? And which banana are we talking about?
Is this referring to Bob winning the lottery next week, or the next time he plays, or any time in the next year? What if he plays with a group that share their winnings, does that count as Bob winning even if it technically wasn’t his numbers?3³ This question is from de Finetti. I found it really insightful in terms of the weird edge cases you can encounter.

Note that events can also be combinations of other events, like “two dice are showing a sum less than six”, which is really a combination of two die throws4⁴ The two throws are usually inpedendent, but if there’s any doubt about it, that also needs to be specified for the event to be well defined.. “Emily Brontë was born 1954” is also an event that may or may not have occurred in the past5⁵ It did not. Brontë was born well before the 1900s.

This might all sound obvious and mundane, but it’s really important and people suck at it. Here are some events people on the web have submitted when asked for predictions for 2023. These are not well defined, making them meaningless for the discussion to follow. I will point out one ambiguity in each, but I’m sure you can find more.

ai will keep wowing us, but in 2023 actual change will be surprisingly little.

Besides this being a statement about how people will feel (“wow” and “surprised”) without specifying which people will feel that way, it uses a classic weasel: “actual” change. Whatever change occurs, the person submitting this prediction can claim it isn’t an actual change.

Tech layoffs in the Bay Area intensify. When hiring begins again a large number of the new hires are either remote or are in other geographical areas where labour costs are lower.

By varying what “large number” means, we can always say that a large number of new hires are in geographical areas cheaper than the Bay Area – this is particularly easy because the Bay Area is notoriously expensive.

Tesla struggles to compete against old school manufacturers that have a grip on quality and make advances on the ev tech side.

There are so many potential meanings of the phrase “struggle to compete” that it’s always going to be true of anyone in any situation. (Also note the “old school manufacturers that have a grip on quality” weasel – if Tesla competes well against some old school manufacturers, the person submitting this can say that those particular manufacturers “don’t have a grip on quality” and thus be right again!)

Operational definitions make for well defined events

One way out of the problem with under-specified events is to adopt operational definitions. An operational definition is a series of steps we can execute, and the last step results in the value we are looking for.

For example, the event “Next week’s lecture will be attended by over 130 people” isn’t well specified.6⁶ Thanks, Deming, for this example. Some questions you might ask:

Do you count staff present, or just audience?
If one audience member has brought their toddler, who has no interest in the lecture itself, does that count?
How about the janitor that happened to be slowly sweeping the floors of the lecture hall during the lecture because they had an interest for the subject?
What about the student that were supposed to attend did not have an interest and left halfway through?

How you count something affects what number you get. There is no true number of people attending the lecture, there are just various procedures for achieving a count, and the selection is subjective!

There are many ways to operationalise the count of lecture attendees:

Stand in the door of the lecture hall and survey everyone entering if their purpose for being there is attending the lecture.
Take a photo from the front of the hall at the 40 minute mark, and count the number of heads visible in it.
Feel each seat two minutes after the lecture has ended and check how many of them are still warm.7⁷ But then you need a protocol for how to handle the case when the seat is still occupied!

If the event in question is about a fact where at least one interpretation may be well-known and widely accepted, it is helpful when operationalising it to have a procedure that involves consulting reliable sources of news or statistics, e.g. Bloomberg, English Wikipedia, industry statistics organisations, the various Census Bureaus of low-corruption countries, and so on.

Logic lacks nuance around uncertainty

Now that we know something about what we are going to talk about (well defined events), we can finally start talking about it.

Logic is the language of the definitive. Logic is objective. In logical reasoning, events are either certain or impossible. If we agree on an implication, e.g. “When it rains, the lawn is wet”, then once we observe rain, the lawn is certainly wet. On the contrary, if we know the lawn is dry, rain is impossible.

If we are ever in doubt about a logical statement, we can look at the real-world outcome of the event, and from this determine whether the logical statement about it was correct. There is nothing subjective about this process: we apply the operational definition, and the result tells us whether logical statement was either true or false. It doesn’t matter who does this, because everyone would have gotten the same result.

A lot of statistically un-trained people are stuck in this framework of logic, where things are certain and causes follow a logical progression. They only know the language of logic, so they express predictions about the future (which are by nature uncertain) in terms of logical reasoning (which deals specifically with the things that are certain).

It is common to try to “guess”, among the possible alternatives, the one that will occur. This is an attempt often made by experts who are inclined to precast the future in the forge of their fantasies. Everyone will no doubt have noticed how often the “foresights of experts” turn out to be completely different from the facts, sometimes spectacularly so. In the main, this is precisely because they are intended as guesses which “deduce”, more or less logically, a long chain of consequences – still considered necessarily plausible – from the assumed plausibility of an initial hypothesis.

Here also one might note that the hypothetical reconstructions of historical events made by scholars and novelists, based on scanty data, are also guesses in the above sense. Ineptitude or laziness prevents us from seeing how many other possibilities there are, besides the first one we happened to think of.

In effect, the aforementioned experts have a perspective of the world where events uniquely and certainly are caused by earlier events in a logical chain. With this perspective, once you know one thing for certain, you can extrapolate as far into the future as you want, and arrive at definitive guesses at what will happen in the future.

This is the same sort of flaw as saying, “When a passenger rides the train they have a ticket.” That’s a plausible consequence, but not a certain one, because sometimes they do not. All we can do, while remaining in the realm of objective logic, is to say, “When a passenger rides the train it is possible they have a ticket.”

When logic is insufficient to tell us whether something is certain or impossible, all it allows us to do is say that it is possible. Everything we are uncertain about goes into the “possible” bucket. This feels a bit unsatisfactory. After all, it’s also possible a person on the street has a ticket, but it seems somehow… more possible for the passenger on the train.

The thing we are not content with, is the agnostic and undifferentiated attitude towards all those things which, not being known to us with certainty, are simply “possible”. In logic, there are no degrees of possibility: it is possible (equally possible) that it snows on a winter or summer day; that a great champion or novice wins the competition; that every student, whether well-prepared or not, will pass an examination; that next Christmas you will find yourself at any place in the world.

However, we do not content ourselves with this, and, in fact, it is not our real attitude. Faced with uncertainty, we feel a more or less strong propensity to expect that certain alternatives rather than others will turn out to be true.

This feeling is what we tap into when we measure varying degrees of possibility. That leads to probability: expressing the level of confidence we have in things that are possible, yet uncertain to us.

Probability is subjective and information dependent

Probability is based on the feeling that some things are more likely than others. As with any feeling, it is necessarily subjective. If I feel that there’s a 20 % chance one of our common acquantiances need to go to the hospital in the next six months, and you feel there’s a 70 % chance, there’s no objective measure by which we can rule your evaluation correct, and mine incorrect, or vice versa. Both our evaluations are compatible with the event, and with the non-event, so whatever happens, it does not constitute proof that either of us was wrong. In fact, we can never gather such proof in the general case.8⁸ The only time we can judge the evaluation wrong is if one of us says 0 % and then the hospital visit does happen, or 100 % and it does not. But then we are practically sneaking back in the realm of logic again, where objective rights and wrongs exist.

What would lead us to arrive at such different probability evaluates of the same event? There are two things, the first of which is more obvious: we simply have different relevant information about the event.

Let us assume we have to make a drawing from an urn containing 100 balls. We do not know the respective numbers of white and red balls, but let’s suppose that we attribute equal probabilities to symmetric compositions, and equal probability to each of the 100 balls: the probability of drawing a white ball is therefore 50 %. Someone might say, however, that the true probability is not 50 %, but b/100, where b denotes the (unknown) number of white balls: the true probability is thus unknown, unless one knows how many white balls there are.

Another person might observe, on the other hand, that 1000 drawings have been made from that urn and, happening to know that a white ball has been drawn B times, one could say the true probability is B/1000. A third party might add that both pieces of information are necessary, as the second one could lead one to deviate slightly from attributing equal probabilities to all balls.

A fourth person might say that he would consider the knowledge of the position of each ball in the urn at the time of the drawing as constituting complete information (in order to take into account the habits of the individual doing the drawing; their preference for picking high or low in the urn): alternatively, if there is an automatic device for mixing them up and extracting one, the knowledge of the exact initial positions which would allow one to obtain the result by calculation (emulating Laplace’s demon.)

Only in this case would one arrive, at last, at the true, special partition, which is the one in which the theory of probability is no longer of any use because we have reached a state of certainty. The probability, “true but unknown” of drawing a white ball is either 100 % or 0 %.

Of the multiple evaluations of the probability of drawing a white ball, none are wrong, they are just made in light of different information. There are many other examples of this, like the probability that France wins the finals of sportsball, or the probability that Alice has an electric car. As you find out more, you can make different probability evaluations – none of them more true than the other.

This is also evident in that we can make probability evaluations about events that have already been determined, conclusively, as long as we don’t know what the conclusion is.

There is a prejudice that uncertainty and probability can only refer to future matters, since these are not “determined” – in some metaphysical sense attributed to the facts themselves instead of to the ignorance of the person judging them. In this connection, it is useful to recall the following observation of E. Borel: “One can bet on Heads or Tails while the coin, already tossed, is in the air, and its movement is completely determined; and one can also bet after the coin has fallen, with the sole proviso that one has not seen on which side it has come to rest.”

Probability, in other words, is about your uncertainty about a fact, and your personal evaluation around that which you do not know.

When hearing about the subjective nature of probability, it is tempting to claim that probability is not subjective at all, it’s only that each person has a unique background, with their smallest of experiences and knowledge differences, which all informs their understanding of the event in question, and if we could remove the effect of “a person’s life up until this point” probability would no longer be subjective. While I can’t argue against that9⁹ After all, it’s an implication where the first part says “if we do the impossible, then …”, it’s practically irrelevant.

Evaluate probability by equivalent certain gain

To evaluate the probability of an event, start by imagining that you would find out the outcome tomorrow, and then if it the event occurred, you would be given $100.10¹⁰ It is important that the monetary amount is small, lest risk management comes into play, but not so small that you no longer care about it. If $100 does not hit that sweet spot for you, use an amount that does. Also it helps that the outcome of the event will become known in the near future (so if it will not, pretend it will), so we can ignore interest rates. We can technicall account for both of these confounders, but it’s easier to avoid them. Now, you are offered a choice: either take the uncertain gain of $100 if the event occurs, or receive a guaranteed amount of money right now, regardless of whether or not the event occurs. What amount of guaranteed money right now would feel equivalent to the uncertain $100 to you? That is the probability you hold of the event.

This is easier to grasp with an example. Say you are going for a short walk. If you happen to encounter an acquaintance, you’d get $100. How much money would you ask for right now to forego that potential $100 gain if you meet an acquantiance? I’m about to leave for an errand where there’s quite a high likelihood I will meet someone I know, so I would maybe find $30 to be equal to that uncertain $100, meaning the probability I assign to meeting someone I know is 30 %.

For difficult cases you might be tempted to say “I don’t know the probability of that!” As we can guess from the above, that answer is invalid.

Probability expresses, for each individual, their own choice in their given state of ignorance. To imagine a greater degree of ignorance that would justify the refusal to answer would be rather like thinking that in a statistical survey it makes sense to indicate, in addition to those whose sex is unknown, those for whom one does not even know “whether the sex is unknown or not.”

There is some amount of money we would consider equivalent, given what we currently know, even if we don’t feel very confident in our lack of knowledge. That feeling of unease is due to ambiguity, not the absence of probability.

Ambiguity is about which information is relevant

When evaluating the probability of an event, one of the fundamental techniques is mentally categorising events into reference classes of similar events. If I want to evaluate the probability of my train being late, I might reach into my memory of earlier train rides, and try to estimate how many of them have been late. But which train rides are relevant for this judgment?

I rode the train in a different city recently – should I count those? The train was late a lot one winter eight years ago – should I count that? In fact, should I count any harsh winter at all, or should I only count rides when the weather was similar to now? The train operator changed recently. I usually travel a different time of day. The population of the city has grown. There’s festivities prohibiting road access. Cars are more expensive to own these days. The ticket price for the train recently increased.

There’s no right or wrong answer for which factors to ignore, and which to pay attention to.11¹¹ Though if you have some data, statistical process control is one way to find out what has had a larger effect in the past, and what did not. Whether or not you think the past is a useful guide to the next event is – you guessed it – a subjective opinion. This matters for finding out which prior experience counts as relevant, and which to ignore. This is the ambiguity of any probability evaluation, and it makes all probability evaluations subjective.

A common misconception by people who try to define an objective notion of probability is to say that two events are really two instances of the same event. They are implicitly lumping together this and another event in the same category, based on personal ideas of what is relevant, and then passing it off as objective!

Two distinct events are always different, by virtue of an infinite number of circumstances (otherwise how would it be possible to distinguish them?) They are equally probable (for an individual) if – and only if – that individual judges the differences to be irrelevant in the sense that they do not influence that person’s judgment.

People simply disagree on which prior experiences are relevant, and there’s no objective way to find out.

Combined probabilities need to be coherent

We have already seen that probability is subjective, and we are free to assign any probability to any event in isolation. However, once we assign probabilities to multiple events, or combinations of events, it becomes important that the probabilities we assign are coherent, which means that they are not self-contradictory.12¹² It’s not strictly forbidden to assign incoherent probabilities, and sometimes it’s useful, but one should only do it intentionally for a specific reason, not out of ignorance.

It is easy to say, “In my opinion, the probability of E is, roughly speaking, twice what others think it is.” However, if you say this, I might ask, “What then do you consider the probabilities of A, B, C to be?” If you remain secure in a coherent view, you will have an opinion that others may consider eccentric, but will not otherwise find defective.

However, it will more often happen that as soon as you face the problem squarely, in all its complexity and interconnections, you come to find yourself in disagreement not only with the others, but also with yourself, by virtue of your eccentric initial evaluation.

The operational definition says that if you offer to bet based on an incoherent probability evaluation, then an adversary can construct a Dutch book against you, meaning they can place bets with you in such a way that in total, whatever events do occur, they are guaranteed to profit, and you are guaranteed to lose.

Mathematically13¹³ The inequality here is based on the betting-type operational definition of probabilities. To better understand this (if you care to – it’s a bit supplementary to the main points), first check the appendix for more on that definition., the probabilities $P(E_1)$, $P(E_2)$, and so on are incoherent when we can find a set of coefficients $c_i$ such that14¹⁴ Please note that the constants $c_i$ can be negative as well as positive! This reverses the direction of the transaction from revenue to cost and vice versa., for all admissible combinations of $E_i$,

\[c_1 E_1 + c_2 E_2 + \ldots + c_n E_n > c_1 P(E_1) + c_2 P(E_2) + \ldots + c_n P(E_n).\]

Fortunately, we don’t have to solve this inequality to assign probabilities. It is sufficient that we follow the rules of probability, which we will come to shortly.

Coherence sounds constraining, but it can actually help us evaluate probabilities, by serving as sort of a rubber duck we can reason about our evaluations with. For example, among the coffee drinkers I’ve observed, I would say roughly 20 % drink their coffee black, meaning no sugar and no milk. About 80 % take no sugar, but may or may not have milk. Additionally, almost nobody I know takes sugar but not milk.15¹⁵ At this point, the requirement of coherence has already determined the probabilities going into all other columns; we could fill in the rest of the table automatically. But we’ll ignore that for the moment to make the point more clear.

	No milk	Marginal
No sugar	0.20	0.80
Sugar	0.03
Marginal

Of the people that do take milk, I think 10 % also take sugar.

	No milk	Milk	Marginal
No sugar	0.20		0.80
Sugar	0.03	0.10
Marginal

If we now fill in the remaining probabilities based on coherence (i.e. do no further evaluation on our own, just trust the rules of probability), we discover I have made an incoherent evaluation! The probabilities add up to only 93 %, but it ought to be 100 %.

	No milk	Milk	Marginal
No sugar	0.20	0.60	0.80
Sugar	0.03	0.10	0.13
Marginal	0.23	0.70	0.93

Discovering this, we start probing: what erroneous evaluations have I made? One of the probabilities that feel off to me is the marginal probability of 70 % milk. That would imply that at a table of 6 of my coffee drinking friends, only four would ask for milk. I think this is too low, and it should probably be 80 %. How did it end up being so low? Looking at the table, I suspect the mik-and-no-sugar evaluation is too low. If I update that to be more realistic, we run into new trouble: the probabilities now sum to 103 %.

	No milk	Milk	Marginal
No sugar	0.20	0.70	0.90
Sugar	0.03	0.10	0.13
Marginal	0.23	0.80	1.03

In this case, I think the suspect is the 20 % of black coffee drinkers – on closer analysis, it should probably be 15 %. So we fix that, discover the total becomes 2 % too low, adjust the last number that feels a little wrong, and then the evaluations are coherent.16¹⁶ Actually, I still see one flaw in these evaluations: According to the table, milk is negatively correlated with sugar (we can see that because P(milk) = 82 %, but P(milk|sugar) = 77 %), but I actually think it’s positively correlated: if I learn that someone takes sugar in their coffee, that would inrease my belief they also take milk, not decrease it. So faced with this table, I should go back and adjust so the conditional probabilities also agree with my beliefs. But I will stop this example here because I think you get the point by now.

	No milk	Milk	Marginal
No sugar	0.15	0.72	0.87
Sugar	0.03	0.10	0.13
Marginal	0.18	0.82	1.00

The process is that we start with gut feeling some probabilities, then we follow the rules of coherence back and forth and gradually refine our evaluations, making the error smaller and smaller for each iteration. This is a very powerful technique, especially when it comes to more difficult evaluations.

Stochastic independence presupposes logical independence

Before we learn more about the rules of probability, we need a brief note on dependence. There are two types of dependence that matter for this discussion: logical and stochastic.

Logical dependence is when one event is certain to occur if another one does, or if one event makes another impossible. We see this often when it comes to partitions, i.e. when the space of outcome is divided into mutually exclusive alternatives.17¹⁷ For example, the combined event heads or tails in a coin flip is made up of two events, which are a partition. The opposite, logical independence, means that all combinations of events are valid; no event prevents any other from occurring, and no event guarantees that another will.18¹⁸ The combined event in the sugar or milk example is logically independent, because sugar neither logically precludes nor guarantees milk, or vice versa. However, we can turn also that example into a partition by extending it to four events: (e1) nothing, (e2) sugar, (e3) milk, (e4) sugar and milk.
Stochastic dependence is a little complicated. To be technically correct, we would have to say that it arises when your probability evaluation of one event changes under the condition that you will (in the future) learn the outcome of another event.19¹⁹ The sugar or milk event is an example of this. With my evaluations above, learning that someone takes sugar changes how likely it is I think they take milk. But this needn’t be the case for all people – some people may think that liking sugar and liking milk in the coffee are two completely unrelated preferences. Stochastic dependence is often expressed as conditional probabilities, defined as $P(A \mid B) = P(A \land B)/P(B)$.

One way to operationalise a conditional probability is as a bet about something uncertain, but where the bet is called off (all transactions undone) if something else does not happen.

Logical independence is a precondition for stochastic independence, so if I say independent in this article, I mean stochastically independent (which implies it’s also logically independent.)

Whereas logical dependence is objective, stochastic dependence is subjective – the level of stochastic dependence is determined by an individual’s probability evaluations around the events in question. That being said, it can turn out to be a costly mistake to assume independence when the dependence structure is not well known. Unfortunately, it’s often in the high-risk situations that the dependence structure matters most – and is the least known.20²⁰ In a market crisis, all correlations go to one.

Rules of Probability

To maintain coherence, there are some rules that must be followed. As you know already, the probability of any event must lie between 0 and 1, inclusive.

Probability of A or B

If we are looking for the probability of one or both of two events $A$ and $B$ occurring, that is given by

\[P(A \lor B) = P(A) + P(B) - P(A \land B).\]

If A and B are mutually exclusive (i.e. only one of the two can happen, not both), then $P(A \land B) = 0$, meaning we can just add the probabilities of the two events to get the probability of either one happening.

If A and B are mutually exclusive, and also form a partition, then

\[P(A \lor B) = P(A) + P(B) = 1.\]

which also means that $P(A) = 1-P(B)$ and vice versa.

This rule for coherent probabilities of one or more of a number of events is based on the inclusion–exclusion principle. For three events, it says that

\[\begin{array}{l} P(A \lor B \lor C) & = & P(A) + P(B) + P(C) \\ & - & P(A \land B) - P(A \land C) - P(B \land C) \\ & + & P(A \land B \land C). \end{array}\]

If you’re interested in the generalisation for more events, look it up!

Probability of A and B

The probability of both A and B is given by

\[P(A \land B) = P(A \mid B) P(B).\]

If A and B are independent, then $P(A \mid B) = P(A)$ which simplifies the above so that the probability of the combined event is the product of the probabilities of the individual events.

There is an important remark to be made here. When events are not independent, we need an evaluation of either the joint probability $P(A \land B)$ or the conditional probaiblity $P(A \mid B)$ in order to complete our evaluation of all probabilities involved. One of these two (joint probability or conditional probability) is required and cannot be inferred from the individual event probabilities.21²¹ When we have one of them, the other can easily be computed.

Extension of the conversation

Related to this is the rule known as extension of the conversation. If all events $B_i$ form a partition, then we can compute

\[P(A) = P(A \land B_1) + P(A \land B_2) + \ldots + P(A \land B_n)\]

or, equivalently

\[P(A) = P(A \mid B_1) P(B_1) + P(A \mid B_2) P(B_2) + \ldots + P(A \mid B_n) P(B_n).\]

The reason I bring it up is that this can often massively simplify probability evaluations: find a partition where the conditional probabilities are easy, then extend the conversation to include it.

Sometimes people misunderstand how to extend the conversation. If I ask about the probability that France wins their quarter finals in a sportsgames championship, someone might counter, “The group stages are not over, so we don’t yet know who France will play against. The probability of them winning depends on who they play, so I can’t yet assign a probability to that event!”22²² An even more extreme version is asking about the probability that France wins the second semi final before the group stages are over – at that point we don’t even know if France will make it to the semi finals!

The idea of considering $P(E)$ reconstructed from multiple partitions $P(E \mid H_1)$, $(E \mid H_2)$, etc. often leads to a temptation that one should be warned against: this is the temptation of saying that we are faced with an “unknown probability”, which is either $P(E \mid H_1)$, $P(E \mid H_2)$, or $P(E \mid H_s)$ but we do not know which is the “true” value until we know which of the hypotheses $H_i$ is the true one. None of the possible hypotheses has any special status entitling them to be regarded as more or less “true”. Any one of them could be “true” if one had the information corresponding to it; in the same way as the one corresponding to one’s present information is true at the moment.

The phrase “unknown probabilities” is already intrinsically improper, but what is worse is that the improper terminology leads to a basic confusion of the issues involved. This is the confusion that consists in thinking that the evaluation of a probability can only take place in a certain “ideal state” of information, in some privileged state; in thinking that, when our information is different (as it will be, in general), more or less complete, or different in kind, we should abandon any probabilistic argument.

The right response for a coherent evaluation in the example of France is to consider all the possible opponents, and weigh the probability that France beats each with the probability that they face that opponent. We may not be able to tell who they will meet with certainty, but we can tell what the probability is that they meet any of them.

Frequency can be used as a subjective guide

We will end this article on one last point around subjectivity. If we have a pile of 12 playing cards, and we flip them over one by one, there’s some probability $P(R_i)$ that the ith card we flip over is red. This probability varies over time, because it depends on the number of red cards in the pile that is not yet flipped over.

The total count of red cards, after all the flippage is over, will be

\[Y = R_1 + R_2 + \ldots + R_{12}\]

where we use the convention that $R_i$ if true evaluates to one, otherwise to zero.23²³ Programmers are usually comfortable with this convention, mathematicians and laypeople need it pointed out. The expected number of red cards is, due to the linearity of the expectation operator,

\[E(Y) = E(R_1 + R_2 + \ldots + R_{12}) = E(R_1) + E(R_2) + \ldots + E(R_{12}).\]

The expectation of a single event (like $E(R_7)$ for example) is actually the same as the probability of that event24²⁴ De Finetti puts a lot of emphasis on this point, even to the extent of using the symbol P also for expectation, or prevision, as he calls it. (Expectation is a bit of a misnomer, since we cannot truly expect the expectation other than in the long run.). This means we can say that

\[E(Y) = P(R_1) + P(R_2) + \ldots + P(R_{12}).\]

Once we know the final count of red cards – say, 3 – this must be equal to E(Y) in this specific situation. That means that

\[P(R_1) + P(R_2) + \ldots + P(R_{12}) = 3\]

or, alternatively, that the mean of all $P(R_i) = 3/12$. So whatever probabilities we assign to the individual flips in that scenario, they must average to 25 %.

This discussion probably seems a little worthless. We only found out the constraint on the probabilities after we had run the experiment.25²⁵ Whether this constraint also applies to a future card pile is a subjective judgment. And this is true. If we had flipped a (presumably fair) coin 12 times, and gotten three heads, then the average probability of a heads in that sequence of flips would have been 25 %. But that doesn’t help us at all, because the past results of the coin don’t influence the future ones.26²⁶ Don’t they? Well, it’s complicated. In no case does the past coin flips influence future flips, because coin flips are independent by definition. Conditional on the hypothesis that the coin has a specific bias (e.g. is fair), the past flips don’t even influence our evaluation of future flips. But if we don’t know the bias of the coin, then past flips can influence our evaluation of future flips, by providing a hint as to the bias.

But sometimes we do know this constraint ahead of time! For example, parliaments usually have a limited number of seats, so elections need to select, say, 349 members out of all politicians. Stack rankings are sometimes performed where the bottom 20 % are considered failures. Maybe in a raffle there are 20 tickets in total and three will be selected winners. In these cases, to be coherent, our probability evaluations need to sum to whatever the fixed number of successes is. This is the one case in which frequency serves as an objective guide to probability evaluation – when we know a summary of the result ahead of time, just not which individual events where successes.

Appendix A: Defining probability with adversarial betting

The “equivalent certain gain” method we used to evaluate probability is lacking in one respect: there’s nothing preventing us from being dishonest and – even though we think the event only has a 20 % chance of occurring – asking for a certain $90 to replace the uncertain $100. There’s little to lose from asking for a lot.

To avoid this, the alternative way of looking at essentially the same thing is as a bet. Again, adopting the convention that $E = 1$ if $E$ occurs, and $E = 0$ otherwise, we can invent a situation in which, whatever probability $P(E)$ we come up with, we are forced to accept a bet with gain $s(E - P(E))$.

Here’s how to read that: an adversary gets to select the stake s. The stake can be positive or negative.

Positive stake: We are forced to pay (lose) $s P(E)$ to enter the bet. If the event occurs, we will then receive $s$; if it does not occur, we receive nothing.
Negative stake: We will receive payment (gain) $s P(E)$ from our adversary entering the bet. If the event occurs, we have to pay $s$; if it does not occur, we pay nothing.

The only thing you control in this scenario is $P(E)$. You have to pick your probability such that you are okay with your opponent selecting either of the positive or negative stake. This is the same principle as the game-theoretic idea of “I cut the cake and you get to choose which part you want first” – it ensures you select a $P(X)$ that truly makes you alright with both sides of the bet.

Entropic Thoughts