How does one test a hypothesis?
Under the frequentism–objectivism school of probability, we would first determine a probability1 Specifically, of committing a type I error, but that’s besides the point here., compare that probability to a threshold, and use the outcome of that to judge whether the hypothesis is definitively true. Once we have shown the hypothesis is definitively true, we can proceed in any decision by simply assuming the hypothesis is true.
The Bayesian–subjectivism school also tells us to determine a probability2 Often known as a likelihood in this context., but then instead of declaring definitive truth, the resulting probability should be used as the strength of our belief in the hypothesis. In the decisions that follow, we calculate the expected value of the alternatives based on that probability.
Critically, it means the Bayesian–subjectivist can make decisions as if the same hypothesis – based on the same evidence – is both true and false, depending on the reward structure of the decision ahead of them. In competition for limited resources, the Bayesian–subjectivist is going to crush the frequentist–objectivist.
Say we have two variants of our product, A and B. We don’t want to maintain both variants according to individual user preferences, so we perform a randomised, controlled experiment on new users to find out which is best overall for our future user base.
If we discontinue one of the variants for no good reason, we will lose $18,000 per month due to upsetting our customers. If we find out one variant is better and we discontinue the other one, our customers would still be upset, but not as much, and it would be outweighed by the reduced maintenance burden, coming in at a profit of $12,000 per month in total.3 As you can tell, this example will be a little contrived, to avoid unnecessary details cluttering the point.
Classical hypothesis testing involves a frequentist–objectivist procedure:
- Before running the experiment, we set a confidence level at which we would like to perform the test.4 This is commonly something like 5 %, often for no good reason at all. Let’s say we are happy with results at a 10 % confidence level for our product future.
- We run the experiment, and discover that variant A was +37 better than B (on whatever dimension we were measuring) .
- The difference of +37 happens just by random chance (sampling error) about 23 % of the time if we would run experiments similar to ours many times.5 We usually don’t run the same experiment many times – instead we base this probability on the assumption that the sampling distribution is approximately Gaussian, or Student t, or some other theoretical distribution. Sometimes there’s good reason to believe this, sometimes there’s not.
- Since 23 % is greater than the 10 % confidence level we picked before the experiment, we conclude that A is not better than B6 Or, if we’re more careful, that we have failed to prove that A is better than B..7 Had the false positive risk been lower than the confidence level, we would have concluded that A is definitely better than B. The idea behind this is that randomisation ought to ensure that any contributing factors not under study are equally divided between the groups, and the only thing separating the groups is the thing under study.
- Having learned this, we treat A and B as definitely equal in future decisions, including this one: we continue to maintain both variants despite the cost of maintenance.
Note the unfortunate situation we are forced into by frequentism–objectivism. Since the probability we get is the risk of type I errors, we cannot use it meaningfully in decision problems. Hence we are forced to make decisions pretending our hypothesis is either definitively true, or definitively false. And we have to make this judgment based on fairly limited information (type I error) to boot.8 In practise, even frequentists maintain an internal judgment of the likelihood of the hypothesis, informally weighing together the results of similar studies. But that’s not how the strict process of hypothesis testing proceeds under the frequentist–objectvist framework, because that happens outside of the framework.
For someone with a more subjective view on probability, the process is a little different.
- We run the experiment and discover that variant A is +37 better than B (on whatever dimension we are measuring).
- This difference leads to us thinking there’s a 77 % chance variant A is actually better than B.9 You might think there’s some relationship between this number and the 23 % the frequentist got. There’s not. It’s pure coincidence. Do not mistake the risk of type I errors for the likelihood that the hypothesis is false!
- Now we know that if we discontinue maintenance of B, there’s a 77 % chance that we make a $12,000 profit, and a (100 - 77) = 23 % chance that we make a $18,000 loss. This is a plain expected value calculation: 0.77×12,000 - 0.23×18,000 = $5,100 profit.10 Note the tricky wording here: we are not guaranteed a profit. In fact, there’s a fair chance (about one in four) we’ll be wrong and make a loss on this decision. But if we continue to use this framework for future decisions, we’ll maximise our profit in the long run. Thus, we discontinue maintenance for variance B.
Note the fuzziness of these steps! We never have to claim that the effect is definitively real, because we have the probability we assign to it being real. In decisions, we can then account for both possible outcomes, weighted by their appropriate probabilities.
Logic is the framework we use to reason with definitive truths and falsehoods. Probability is the corresponding framework for dealing with the uncertain.
With Bayesian–subjectivism we are free to make decisions within this framework for uncertainty. In contrast, under the frequentism–objectivism school, we cannot make decisions within the framework of probability. We have to step out of this framework to make any decisions. We have to dichotomise uncertainties into either true or false – in a way that doesn’t account for important information pertaining to the decision at hand.
This is the false dichotomy created by frequentism–objectivism.
Priors and stepping out of the framework
I have received some comments from readers claiming that we also have to step outside of the Bayesian–subjectivist to make decisions. These comments explain that the Bayesian hypothesis test is an application of Bayes’ rule, and the prior probability cannot be determined within the Bayesian–subjectivist framework – it needs to come from… somewhere else.
I disagree. There is nothing about the Bayesian–subjectivist hypothesis test that requires an application of Bayes’ rule. If it is applied, it may just as well be applied backwards, e.g. going from posterior (based on the experiment) to what the prior must have been. This may seem like an unfamiliar view, but the basic idea is that a Bayesian–subjective probability doesn’t have to come from any procedure; it’s just there in the minds of the person doing the evaluation, and that’s fully within the framework itself. I have an upcoming article with more details on this.
Appendix A: Terminology Note
Parts of my statistical knowledge sometimes comes from fairly old books11 Theory of Probability; de Finetti; Wiley; 1974.. At that time, there was a lively debate around whether probability should be thought of as subjective or objective.12 Other than in a few fairly limited circumstances (like throwing dice, or white winning a game of chess when both players play completely random moves), does there exist a “true” probability underlying each event, or is the probability of the event a function of the individual uncertainty one has around the event? In these debates, the people in favour of an objective interpretation of probability tended to also favour frequentism. In contrast, the people suggesting probability is subjective tended to approach problems Bayesically. This is why I’ve lumped together frequentism–objectivism and Bayesian–subjectvisism.13 And my views on this subject is a separate article – but you may be able to suspect in which direction I lean based on this article.
I have had pointed out to me there are today both objective and subjective schools of Bayesianism. The way I use the phrases is independent from any such meaning – and I should read up on what the appropriate terminology is for these things!