Two Wrongs

The Misunderstood Kelly Criterion

The Misunderstood Kelly Criterion

Whenever I read discussions on the internet about the Kelly criterion1 Maybe I’m hanging out with the wrong people since the Kelly criterion is not a viable discussion topic in my limited real-life social circles … I encounter three extremely common misconceptions:

  1. “People working with money don’t actually use the Kelly criterion.”
  2. “The Kelly criterion is a method to find the optimal size of binary bets.”
  3. “The Kelly criterion is only valid if you have logarithmic utility.”

All of these are false. To understand why, it helps knowing what the Kelly criterion is.

To maximise compounding, maximise the geometric expectation

The Kelly criterion is concerned with economic choices. It tells us which course of action we should pick

  • If we want to maximise something in the long run,
  • provided that the thing grows geometrically2 Also popularly known as exponentially..

Both of these conditions are necessary and sufficient for the Kelly criterion to be relevant.

With this, we can already squash misconception 3: the Kelly criterion does not require logarithmic utility, it only requires trying to long-term maximise something that grows geometrically.

Perhaps the most common example of something where these conditions hold is money. We want to maximise our money, often in the long run, and money grows geometrically, i.e. gains and losses are compounding. When we deal with money, the Kelly criterion tells us how to choose between alternative courses of action in a way that maximises our money in the long run.

I think this is most clear with an example.

An example of an economic choice

Let’s say we have pre-ordered and paid for a new car that we will get a year from now.3 Supply chain issues, you know.

Unfortunately, our current car has gathered a little more wear-and-tear than we are comfortable with; it would cost $5,000 to replace the worn out parts. If we don’t replace those parts, the mechanic tells us there’s a 10 % chance the car breaks before the year is out, and then we will have to rent a replacement for a total of $40,000.

We have an economic decision at hand, with two alternative courses of action: replace the worn out parts, or wait and see. Here’s how most people would reason through this decision:

  • If we choose to replace the worn out parts, that’s a guaranteed cost of $5,000.
  • If we wait and see, that’s a 10 % risk of paying $40,000, i.e. an expected cost of $4,000.

Using this reasoning, we would conclude that the most economical course of action is to wait and see.

This misses a critical insight about the problem.

The geometric growth of money means losses should count more

Remember that thing about wins and losses being compounding? There’s a common saying that goes something like

It takes as long to go from $10,000 to $50,000 as it takes to go from $2,000 to $10,000.

In other words, if we have $10,000 and stand the risk of losing $8,000, the future splits into two paths: in one, we have $50,000, and in the other, at the exact same time, we have $10,000 again. The only difference is whether we lost that $8,000.

Losing $8,000 from a starting point of $10,000 is equivalent to a $40,000 loss, once we account for compounding, or geometric growth!4 Technically, for this to be true, we have to specify a timeframe. Maybe 20 years would be reasonable for this example.

This is the core principle that drove Bernoulli to invent the Kelly criterion in the 1700s.5 Exposition of a New Theory on the Measurement of Risk; Bernoulli; Papers of the Imperial Academy of Sciences in Petersburg, Vol. V; 1738. Available online. We need to account for the compounding effects of loss when evaluating alternative courses of action.

The Kelly criterion leads to the opposite conclusion

Let’s go back to our car situation. How serious a loss is depends on how much money we have right now. Let’s say our current wealth is $50,000, and we want to know whether we should replace the worn out parts, or wait and see.

To account for the compounding nature of losses, we can use the geometric expectation of total wealth instead of the arithmetic expectation of gains/losses when evaluating the alternatives.6 This is the mathematically correct thing to do, although I leave out the proof. If you want to dig into the gory details, see The Kelly Capital Growth Investment Criterion; MacLean, Thorp, & Ziemba; World Scientific Publishing; 2011. Yes, it’s that Thorp.

  • If we replace the worn out part for $5,000, we are guaranteed to end up with a reduced wealth of $45,000.
  • If we wait and see, there are two things that could happen:

    • With 90 % probability, our wealth will be unchanged at $50,000.
    • With 10 % probability, our wealth will be reduced to $10,000.

    The geometric expectation of these two outcomes is

    \[50{,}000^{0.9} \times 10{,}000^{0.1} \approx 43{,}000.\]

Since a wealth of $45,000 is better than one of $43,000, in this situation, it’s actually more economical in the long run to replace the worn out part, even if that seemed more expensive with the naïve analysis.

Thank you, Kelly criterion!

Summary of the basic Kelly criterion

Here’s a summary of what we’ve learned so far.

The Kelly criterion applies when

  • We want to maximise something in the long run, and
  • The thing is compounding in gains and losses, i.e. grows geometrically.

The Kelly criterion tells us to

Choose the course of action that maximises the geometric expectation of our total amount of the thing.

The reason for this is that

In the presence of compounding, big losses are more serious than they seem at face value, and we account for that using the geometric expectation of our total amount of the thing.

This is all there is to the Kelly criterion. The rest is commentary. But it’s useful commentary, so I suggest reading on.

Everybody working with money uses the Kelly criterion

Before we move on, though, let’s squash misconception 1 since we can now do so.

As we can see from the above, the Kelly criterion is more of a mathematical fact than a betting strategy. The Kelly criterion tells us which, out of a set of alternative courses of action, will lead to most money in the long run.

If someone working with money claims they do not use the Kelly criterion, there are effectively two possibilities:

  • They are not trying to maximise money in the long run. This is not as stupid as it sounds; there are many other things one might want to optimise for.
  • They are using the Kelly criterion under a different name. They may even have invented mental heuristics on their own that approximate the Kelly criterion.

So, to be slightly more nuanced – yes, there are people who work with money and don’t use the Kelly criterion – but they are not maximising the amount of money in the long run. Anyone who does that are using the Kelly criterion under some name, because the Kelly criterion is literally that thing that maximises money in the long run. There is no other thing that does that.

Exercises in shipping insurance

I’m going to lift an example straight out of Bernoulli’s paper on the Kelly criterion, because it really helped my intuition for this.

Caius has purchased goods in Amsterdam which could be sold for 10,000 rubles in Petersburg, and thus has loaded them on a ship bound there. In this age, about 5 % of shipments on that route are lost, but the cheapest insurance costs 800 rubles.

If we try to evaluate this the common way, we’ll find – as we almost always do with insurance7 The exception is subsidised insurance – the US property market suffers from underpriced insurance against environmental disasters. – that the insurance is overpriced. The fair price for the shipment insurance would be 500 rubles, but that would leave the insurer no profit.

If we look at it instead from the perspective of the Kelly criterion, what must Caius’ wealth \(w\) be for him to abstain from insurance?

  • Alternative 1, with no insurance, leaves Caius with a geometric expectation of \((w + 10{,}000)^{0.95} \times w^{0.05}\).
  • Alternative 2, with insurance, leaves Caius with a guaranteed wealth of \(w + 10{,}000 - 800\).

If we set these two equal to each other, we will find the value for \(w\) at which it stops making sense to take insurance. This happens at \(w \approx 5{,}000\). If Caius has fewer than 5,000 rubles, aside from the cargo of 10,000, he should insure the shipment.

Now here comes the interesting bit. Let’s take the other side of this transaction. What wealth \(w\) must the insurer possess for it to be sensible for them to offer the insurance at 800 rubles? This happens when

\[(w + 800)^{0.95} \times (w + 800 - 10{,}000)^{0.05} = w.\]

Solving this equation yields \(w \approx 14{,}000\). The insurer must have more than 14,000 rubles to offer the insurance at 800 rubles.

If we imagine that Caius had 3,000 rubles, and the insurer 20,000 rubles – note what happens. When viewed through the Kelly lens, both insurer and insured profited from the exchange.

This is what makes insurance powerful. When entered sensibly, it’s a win-win transaction. But we need to know about the Kelly criterion to evaluate it properly.

The Kelly criterion in everyday affairs

It might sound exhausting to compute geometric expectations anytime we’re in an economic decision. Fortunately, there’s a useful mental shortcut.

When the amounts involved are small compared to our wealth (we are dealing in “everyday affairs”, as Bernoulli put it), the regular arithmetic expectation of losses is a very accurate approximation to the Kelly criterion. In other words, if we were dealing with $5 for repairs and $40 for breakage, with our wealth still being $50,000 – then taking the rare breakdown is preferable to repairs.

The intuition behind this is that the arithmetic expectation is what we get if we are able to make the same bet over and over again, many times. For amounts that are close to our total wealth, we can’t afford to make the same bet over and over. For small amounts, we can, and we should.

From geometric expectation to log-wealth

Like any much loved child, the Kelly criterion has many names. Most of them involve the cryptic notation “E log X”. What this really means is that in our car scenario, instead of evaluating

\[50{,}000^{0.95} × 10{,}000^{0.05}\]

to find the cost of waiting and seeing, we can evaluate

\[0.95 \log{50{,}000} + 0.05 \log{10{,}000}.\]

In terms of trying to optimise growth, these two will give equivalent results. The latter is a plain arithmetic expectation, but not of the wealth – instead, of the logarithm of the wealth. If we denote the wealth by X, the expression stands for E log X.

Thus, the Kelly criterion in a more compact form says something like

To maximise something which grows geometrically, choose the course of action that maximises the expected log-wealth of the thing.

The intution here is that the logarithm actually stands for growth rate. So by maximising the expected logarithm, we’re picking the course of action that on average gives us the highest growth.

The Kelly criterion with discrete distributions

We have seen the Kelly criterion in a situation where we are considering whether to do something or not do it, like with the preventative repair. This is a nice situation in that one of the courses of action leads to a guaranteed amount of wealth.

Investing everything into a single project

Sometimes it’s not as easy. Maybe our department budget allow us to invest into just one research project, but we have two promising candidate projects lined up. In this scenario, imagine we need to invest everything into one of two projects. Here are the potential payoffs of the first project:

Probability 60 % 30 % 10 %
Return on investment 0.5 2 10

This is a risky project that will most likely end up just bleeding away half our budget. The middle outcome though, is doubling our investment. And if we’re very lucky, we can get ten times our investment back!

Then we have the payoffs of the second project:

Probability 30 % 40 % 30 %
Return on investment 0.8 1.1 2

This is a kind of boring project. It has the same chance as the previous one of doubling our investment, but it’s less likely to lose us much money. It will never do better than double our investment. Which project should we go for?

Let’s bring out the Kelly criterion.

  • Risky project, expected growth

    \[G_w = 0.6 \log{(0.5w)} + 0.3 \log{(2w)} + 0.1 \log{(10w)}.\]

  • Boring project, expected growth

    \[G_b = 0.3 \log{(0.8w)} + 0.4 \log{(1.1w)} + 0.3 \log{(2w)}.\]

Simplifying these expressions, we get

  • \(G_w = 0.02 + \log{w}\)
  • \(G_b = 0.18 + \log{w}\)

Since the investment \(w\) is the same in both cases, we can ignore that for the purposes of maximisation. Perhaps surprisingly, the boring project is a much better investment if we want to maximise wealth in the long run.

While this was more complicated in that we had to do the E log X dance for both alternative courses of action, we were still doing the same thing that we did before: figure out the growth rate of all alternatives, and pick the one for which it is highest.

We did this now with two projects, but we could just as easily do it with three, or five, or 100, or a practically infinite number.

Splitting investments between projects

Instead of throwing all of our budget into just one of the two projects, what would happen if we could split the budget and invest a little into both? It gets more complicated, because now there are no longer three possible outcomes, but nine pairs of outcomes.

On the top row are the outcomes of the risky project, and along the left side the outcomes of the boring one. In each cell is the probability of that pair of outcomes. We assume here the projects are independent – if they are not, adjust the joint probabilities accordingly.

  0.5 2 10
0.8 0.18 0.09 0.03
1.1 0.24 0.12 0.04
2 0.18 0.09 0.03

The growth rate if we did half the budget into each would be

\[\begin{array}{l} G_{0.5} & = & 0.18 \log{(w(0.5+0.8)/2)} & + & 0.09 \log{(w(2+0.8)/2)} & + & 0.03 \log{(w(10+0.8)/2)} \\ & + & 0.24 \log{(w(0.5+1.1)/2)} & + & 0.12 \log{(w(2+1.1)/2)} & + & 0.04 \log{(w(10+1.1)/2)} \\ & + & 0.18 \log{(w(0.5+2)/2)} & + & 0.09 \log{(w(2+2)/2)} & + & 0.03 \log{(w(10+2)/2)} \end{array}\]

This is tedious to work out, but not any more difficult than what we’ve already done8 Though I have to admit at this point I pulled out the spreadsheet to reduce the tedium., and we find that

\[G_{0.5} \approx 0.23 + \log{w}.\]

Wait!

Isn’t that weird?

Carefully study the three alternative courses of action we have evaluated the growth rate of:

  • Invest fully into the boring project, \(G_0 \propto 0.18\).
  • Invest half our budget into each project, \(G_{0.5} \propto 0.23\).
  • Invest fully into the risky project, \(G_1 \propto 0.02\).

By investing half in each research project, we are getting a higher return on our investment than if we shove all the money into one of the projects. How can half-assing two projects earn us more return than the maximum return of either project alone?

The intuition behind it is that by putting half of our budget on the risky project will limit its impact if it ends up losing money, but we can still take some advantage of its upsides if we get lucky. In a world where both profits and losses compound, that’s a winning strategy.

Finding the optimal split graphically

There has to be an optimal amount of our budget we should allocate to the risky project – what is that?9 If you’re good at analysis, you can compute the derivative of \(G_f\), the growth rate as a function of the fraction \(f\) invested into the risky project, and then find its stationary point and yada yada. The calculus route isn’t that difficult but beyond the level of tedium I can bear now. One way to do it is visually, by asking our computer to plot the growth rate \(G_f\) as a function of \(f\):

kelly-criterion-wild-mild-fraction-01.svg

Isn’t that neat? We should invest about 28 % of our budget into the risky project, and the rest into the boring one. That will earn us the greatest roi, which is 1.4× higher than if we just dumped all our budget into the boring project.

Note that we are still computing the growth rate of alternative courses of action, except now we were looking at 100 different courses of action, counting up from 0 % to 100 %.

Negative-return investments can be good for us

This is somewhat of a digression but a useful lesson nonetheless, so bear with me. Imagine we learned something about the risky project that caused us to re-evaluate the probabilities of its outcomes. Now, it stands at

Probability 70 % 20 % 10 %
Return on investment 0.5 2 10

It turns out it’s even more likely than before to bleed money. This project is strictly negative-return at this point; the expected growth is

\[0.7 \log{(0.5w)} + 0.2 \log{(2w)} + 0.1 \log{(10w)} \approx -0.12 \log{w}.\]

However, here’s that same plot against fraction invested:

kelly-criterion-wild-mild-fraction-02.svg

The optimal amount to invest in the negative-return project is not 0 % – it’s in the neighbourhood of 20 %. Imagine trying to convince someone to sink a fifth of their budget into something that’s negative-return. But that is indeed what we should do in this case, if we want to maximise the growth of our investments.

Asset allocation with the Kelly criterion is a multi-dimensional problem

In the previous section, we could plot the growth rate as a function of just one variable: how much we invested into one of two projects. Sometimes, we have multiple types of investment we can choose from. A common example is splitting up our savings investments into asset classes like

  • Equity (stocks);
  • Bonds; and
  • Commodities.

We can do this following the same principle: we declare \(f_1\) to be the fraction invested into equity, \(f_2\) the fraction invested into bonds, and then \(f_3 = 1 - f_1 - f_2\) is what remains that gets invested into commodities. How should we pick these fractions?

My 2D plotting skills are not great so I will spare you a visual example, but the idea is the same. We estimate a joint probability distribution of the roi of each of the asset classes above, and then for each permissible combination of \(f_1\) and \(f_2\) we compute the growth rate, and then pick the fraction with the highest growth rate.

And this, at last, means we can squash also misconception number 2. The Kelly criterion is not a method for sizing binary bets. The Kelly criterion tells us which alternative course of action is optimal when we try to maximise long-term wealth. It can do this in all sorts of multi-dimensional, continuous problems.

Summary

We have touched on some big calculations, but all along we’ve been doing the same thing:

To maximise something which grows geometrically, choose the course of action that maximises the expected log-wealth of the thing.

Appendix A: Mixed questions

These are points I don’t have time to cover properly, but seem important enough to include a note about.

What about maximising growth when it does not compound?

Maybe we’re at the racetrack and our parents have given us a single $10 and told us that whatever we win, we need to keep. We’re not allowed to bet with winnings. Does this mean the Kelly criterion is no longer relevant?

Yes, that’s correct. When reinvestment is out of the question, we can no longer count on geometric growth.

The optimal course of action in this situation is to find the one horse that is most underpriced in terms of odds, and sink the entire $10 into that. Most likely we’ll lose it all, but on average in similar situations we’ll profit the most.

What does “in the long run” mean?

In this article I’ve tried to be careful to add the “in the long run” qualifier in many places. In the long run means the thing Keynes talked about when he quipped,

The long run is a misleading guide to current affairs. In the long run we are all dead.

The merits of the Kelly criterion that we have discussed are really only useful if you are planning on living literally forever, or managing something that does. The Kelly criterion is at its core one of those asymptotic facts that are really only true once you take the limit when time tends to infinity.

That said, there are some other positive aspects of the Kelly criterion that are true even in shorter timeframes, like the probability of doubling your wealth before it is halved, and other more short-sighted measures. But! I don’t remember the details here so I will have to write a separate article on that once I get back to re-reading those papers.

What does “fractional Kelly” mean?

Making investments that maximise growth rate can sometimes require a steely stomach. Involved in maximise growth is the potential for some harrowing losses. An alternative to this is to perform a fractional Kelly investment, where you keep some fraction of your money in reserve and don’t use it. This will get you a lower growth rate, but still optimal for the amount of money you do use for investments.

Another reason to do this is that you might be unsure about what the optimal investment really is according to the Kelly criterion (e.g. because your data is spotty, or the future is hard to predict). Then it makes sense to play it on the safe side and hold some money out of the market.

What constitutes our “wealth”?

To maximise growth, we need to know what our current wealth is, because that puts any prospective losses and gains in important context. But what is our wealth, really? In increasing order, it might be,

  1. Our disposable cash;
  2. All cash in our bank accounts;
  3. All liquid assets including savings and investments;
  4. Also counting the next 12 months’ of salary;
  5. Add to that the value of our home;
  6. All of our future salaries; and so on.

Bernoulli argued for using a more extreme version of this, i.e. if two people have the same assets on their balance sheet, but one is of a higher-paid profession, that person should count their wealth as greater.

I take a more restrictive stance. I don’t count the value of my home because it’s not liquid enough.10 And even if it was, my subjective liquidity judgment of it is low because I don’t like the idea of moving over a bet that turned out bad. I also don’t count future earnings because I don’t want to bother discounting for time and predicting interest rates in my calculations.

What’s left is basically point 3 above. If you’re a hardliner like Bernoulli you will look at my Kelly bets and think of them as fractional Kelly bets, because I’m not strictly considering “all” my wealth. I’m fine with that. I like fractional Kelly bets, because they mean I can retain a comfortable cusion of cash even if things go very bad.

Referencing This Article