Build vs. Buy

kqr

, published 2021-07-21

Tags:

I don’t remember where I read it, but it’s stuck with me because it was such a Columbus egg1¹ An idea that seems obvious only once you’ve had it explained to you.. This is how to reason about the build-versus-buy decision2² In case you’re unfamiliar, this decision is about whether you should design your own component or use an existing third-party one. In particular, don’t take “buy” literally – it could mean using an open source alternative.:

High cost, tightly integrated, and difficult to design? Build, don’t buy. Everything else? Buy.

That’s it.

Reasoning

If a component is high cost, tightly integrated with the rest of the system, and difficult to design, then we need to become experts on that component in order to properly evaluate suppliers and make a good purchasing decision. Alternatively, if a component is a major part of the value we’re providing, then we also need to become experts to safely buy it.

How does one become an expert at anything? By doing it! In other words, even if we want to buy one of these difficult components, we need to design it ourselves first, and try it out in production, and maintain it for a bit, so that we understand everything that goes into it. But if we’ve already designed it and started to maintain it, we may just as well continue to use it and develop it.

The reason we want to buy as much as possible is that an organisation has a limited capacity for expertise, so we don’t want to have to become experts on things that don’t make up a competitive advantage.3³ I believe the popular Choose Boring Technology essay also talks about this, but calls it “a limited supply of innovation tokens”. I think “capacity for expertise” says more about what actually happens.

Skunk Works

The Lockheed Skunk Works under Kelly Johnson were masters at optimising using this rule. Whenever they designed a new aircraft, they first focused on what the main selling point was going to be. For the F-117, they wanted a plane that was invisible on radar. Then they found out a way to satisfy this requirement in the lab; in the case of the F-117, they built wooden prototypes, stuck them on sticks, and fired radar at them until they discovered a shape that made the plane disappear4⁴ At one point they also had to make a stealth stick because the stick they put the plane on shone brighter on the radar than the plane itself.. With the main selling point, there usually comes a primary challenge. The F-117 had a very weird shape, which made in-flight stability a concern. Skunk Works then found cheap ways to work at that, which for the F-117 meant building computers to stabilise the plane, and literally pushing wooden models of it off of the roof of the lab to test. To recap:

Skunk Works invented something to fulfill the main selling point.
They also invented something to alleviate the primary challenge that comes out of satisfying the main selling point.

For everything else, they refused to invent anything new. Everything else was built with off-the-shelf parts. This made it possible for them to meet tight deadlines, the development process itself was very cheap, and reliability and repairability was mostly inherited from tried-and-true components.

Sometimes, however, that process leads to funny conclusions. The SR-715⁵ Or rather, it’s cia-owned predecessor, the A-12 had one main selling point: speed. It was going to outrun bullets and missiles, and that was really the main thing it was built for. The primary challenge that comes with speed is heat. And heat affects everything on the plane.

So with the SR-71, instead of buying most parts, and building one or two critical parts, they ended up in-house designing and manufacturing every single part6⁶ Except for the engines. with specialised tooling and everything. It’s a natural consequence of the build-versus-buy rule, but it’s perhaps not immediately obviously so, because rarely does that rule lead to buying everything.

What About Simplicity?

I’ve received a few comments to the effect of, “But sometimes the third-party solution is really complicated, and all you need is a simple subset of its functionality.” That’s true!

In those cases you might be better off designing the simple thing yourself, instead of making the end result much more complicated. But you might also run into a Greenspun type situation7⁷ Greenspun’s tenth rule: “Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”, where over time you end up re-implementing many of the complications of the third-party solution, and you would have been better off integrating the third party solution from the start.8⁸ An example close at home for me is how at a previous job we accidentally ended up re-implementing a significant chunk of Kubernetes – except ad hoc, informally specified, bug-ridden and slow. Of course, nobody set out to do that. It’s just that the sort of functionality you need once you start down that road happens to be the same for everyone, so we figured out the same things the Kubernetes team did. Just with less funding and experience. But you don’t know that for sure when you start out! This type of judgment is the reason good software engineers get paid a lot of money – not for typing in code.