# Huffman Codes β How Do They Work?

Writing tests 17 %
Meetings 13 %
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %
Standup 5 %
Programming 3 %
Documenting 2 %

# Corporate espionage through post-it notes

Tomorrow, you (a corporate spy!) will be in the office across the street and I want to signal to you what I’ve been doing throughout the day. I will do so by putting up post-it notes on a window. I have three colours of post-its: π₯, π¨ and π©.1 If a lot of people have problem with their fonts not rendering these unicode characters as coloured squares, let me know.

We have just three colours, so we cannot assign one colour to each task β that would only let us communicate the three most common tasks. What if we use two colours? Still not quite there, since that only lets us communicate $$3^2 = 9$$ different tasks. NaΓ―vely, we might think we need three colours per task, resulting in the following codebook2 This is practically counting in ternary, except using coloured post-its instead of the digits 0,1,2.:

Now if you look over at my window and you see this:

you know that I first spent quite some time in meetings (2Γ π© π© π₯), then worked on tests (π© π© π¨), reviewed code (π© π¨ π¨) and finally did some reading (π© π© π©). But that was a long sequence of post-its! If my boss walks past, he will think it looks very suspicious and tear them down. Using this codebook, communicating $$n$$ tasks will always take $$3n$$ post-it notes.

How can we approach creating a codebook that results in the shortest coded message on average, to avoid rousing suspicion?

# Huffman codes are short on average

The key idea is that for a rare task, like programming or documentation, we can afford to use a long codeword. In fact, if it lets us use a short codeword for something common like reading, we should be fine with needing more than three post-it notes to encode something rare like documentation.

The Huffman code is an algorithm we can follow that generates an optimal codebook. Here’s how to do it.

First, list all tasks in order of frequency. Check β we already did.

Writing tests 17 %
Meetings 13 %
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %
Standup 5 %
Programming 3 %
Documenting 2 %

Assign post-it colours to the three least common tasks (to keep things simple, always assign colours in the same order):

Writing tests 17 %
Meetings 13 %
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %
Programming 3 % π¨
Documenting 2 % π₯

Now, and this is going to be a little tricky to follow, but mentally group these three together into a supertask, that has frequency 5Β +Β 3Β +Β 2Β =Β 10Β %. Move this supertask up to its correct place in the frequency table.

Writing tests 17 %
Meetings 13 %
(1) Programming 3 % π¨
(1) Documenting 2 % π₯
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %

Assign again colours to the three least common tasks.

Writing tests 17 %
Meetings 13 %
(1) Programming 3 % π¨
(1) Documenting 2 % π₯
Debugging 10 %
Code review 8 %
Lunch 6 % π¨
Slack 5 % π₯

Mentally group them together the same way, and the group has a combined frequency of 7+6+5=18Β %, so it goes higher up in the table.

(2) Lunch 6 % π¨
(2) Slack 5 % π₯
Writing tests 17 %
Meetings 13 %
(1) Programming 3 % π¨
(1) Documenting 2 % π₯
Debugging 10 %
Code review 8 %

Assign post-its to the three least common β but treat the entire group (1) as a single entity this time. It means all three items in group (1) gets prefixed with one colour.

(2) Lunch 6 % π¨
(2) Slack 5 % π₯
Writing tests 17 %
Meetings 13 %
(1) Programming 3 % π© π¨
(1) Documenting 2 % π© π₯
Debugging 10 % π¨
Code review 8 % π₯

Then group these together β treating the group (1) as a unit again. Note that the entire group (1) has a probability of 10Β %, so the sum for group (3) is 10+10+8Β %, which launches it into the top of the table.

(3) (1) Programming 3 % π© π¨
(3) (1) Documenting 2 % π© π₯
(3) Debugging 10 % π¨
(3) Code review 8 % π₯
(2) Lunch 6 % π¨
(2) Slack 5 % π₯
Writing tests 17 %
Meetings 13 %

Repeat again.

(4) (2) Lunch 6 % π© π¨
(4) (2) Slack 5 % π© π₯
(4) Writing tests 17 % π¨
(4) Meetings 13 % π₯
(3) (1) Programming 3 % π© π¨
(3) (1) Documenting 2 % π© π₯
(3) Debugging 10 % π¨
(3) Code review 8 % π₯

At this point, we only have three groups remaining: (4), (3), and reading. We prefix with one colour each again.

(5) (4) Writing tests 17 % π© π¨
(5) (4) Meetings 13 % π© π₯
(5) (3) (1) Programming 3 % π¨ π© π¨
(5) (3) (1) Documenting 2 % π¨ π© π₯
(5) (3) Debugging 10 % π¨ π¨
(5) (3) Code review 8 % π¨ π₯

That’s it! This is our Huffman code. With this new, more efficient codebook, the message from before becomes

This is indeed the same message: meetings (2Γ π© π₯), tests (π© π¨), review (π¨ π₯) and finally reading (π₯). And we got the message across using just 9 post-it notes, compared to the 15 from before.

Here are two, perhaps obvious features of the Huffman code3 There is also one thing that confused me at first, but after brief thought I realised how satisfying it is. There is only one task that gets assigned a single post-it note, namely reading (π₯). Instinctively, I would think the code would get shorter if also the next most common task, writing tests, would get a single post-it note. But! If both reading and writing tests got a single-post-it codeword, then we would not be able to have three other two-post-it codewords (writing tests (π© π¨), meetings (π© π₯), debugging (π¨ π¨), and code review (π¨ π₯)), and with the distribution given, that would result in a worse average message length..

• Short codewords are reserved for common tasks. And indeed, it can assign longer-than-necessary codes to very rare tasks.
• Codewords are immediately decodable β once you reach the end of a codeword, you’ll know it. There is no ambiguity and you don’t need to read ahead to understand a previous codeword.

# Sometimes it doesn’t work out so nicely

The extremely sharp-eyed reader will have noticed that the percentages in the table don’t quite add up to 100Β %. This is because in the real data, I have another entry for “others” at 3 %.

Writing tests 17 %
Meetings 13 %
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %
Standup 5 %
Programming 3 %
Miscellaneous 3 %
Documenting 2 %

If we tried the above process with this table, we would find at the end that we have two groups left and three colours of post-its to hand out. The solution is to pad out the table with dummy tasks that happen 0Β % of the time. Here’s how it would start:

Writing tests 17 %
Meetings 13 %
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %
Standup 5 %
(1) Documenting 2 % π¨
(1) Dummy 0 % π₯
Programming 3 %

And then we continue, treating the group (1) as a unit:

Writing tests 17 %
Meetings 13 %
(2) (1) Miscellaneous 3 % π¨ π©
(2) (1) Documenting 2 % π¨ π¨
(2) (1) Dummy 0 % π¨ π₯
(2) Programming 3 % π₯
Debugging 10 %
Code review 8 %
Toolmaking 7 %
Lunch 6 %
Slack 5 %

And after a few more steps:

(5) (3) Lunch 6 % π© π¨
(5) (3) Slack 5 % π© π₯
(5) Writing tests 17 % π¨
(5) Meetings 13 % π₯
(4) (2) (1) Documenting 2 % π© π¨ π¨
(4) (2) (1) Dummy 0 % π© π¨ π₯
(4) (2) Programming 3 % π© π₯
(4) Debugging 10 % π¨
(4) Code review 8 % π₯

We are now ready for the last assignment:

(5) Writing tests 17 % π© π¨
(5) Meetings 13 % π© π₯
(4) (2) (1) Documenting 2 % π¨ π© π¨ π¨
(4) (2) (1) Dummy 0 % π¨ π© π¨ π₯
(4) (2) Programming 3 % π¨ π© π₯
(4) Debugging 10 % π¨ π¨
(4) Code review 8 % π¨ π₯

We still have that dummy task, which we can drop from the codebook. Now that we had 12 tasks to represent, the Huffman code introduced some four-post-it codewords in order to be able to keep short codewords for common tasks.

# Huffman also handles skewed distributions

Imagine now that we had a much more skewed distribution, something like what might have happened very early in my career:

Programming 31 %
Lunch 6 %
Standup 5 %
Debugging 4 %
Slack 3 %
Miscellaneous 3 %
Meetings 2 %
Toolmaking 2 %
Documenting 2 %
Code review 1 %
Writing tests 1 %

For this distribution, we would expect the Huffman code to generate short codewords for reading and programming, and longer codewords for the rest. Let’s find out what happens!

Programming 31 % π¨
(5) (4) (2) (1) Writing tests 1 % π₯ π© π© π₯ π¨
(5) (4) (2) (1) Dummy 0 % π₯ π© π© π₯ π₯
(5) (4) Standup 5 % π₯ π© π¨
(5) (4) Debugging 4 % π₯ π© π₯
(5) (3) Slack 3 % π₯ π¨ π©
(5) (3) Miscellaneous 3 % π₯ π¨ π¨
(5) (3) Meetings 2 % π₯ π¨ π₯
(5) Lunch 6 % π₯ π₯

Indeed! At this point, the value of reserving short codewords (single post-it notes) for the two most common tasks was high enough that the Huffman code even created some five-post-it codewords for the least common tasks.4 Lunch apparently took enough of my time even as a junior that it deserved a two-post-it codeword.

What does a typical day look like with this codebook? Maybe something like