Hacker News user jiggawatts wrote in a little-read comment a few months ago that
Computer architectures encompass 13 orders of magnitude of performance! That’s roughly the difference between something like a trivial function call processing data in L1 cache to a remote call out to something in another region. People often make relatively “small” mistakes of 3 or 4 orders of magnitude, which is still crazy if you think about it, but that’s considered to be a minor sin relative to making the architecture diagram look pretty.
I think the point of the comment was that it’s easy to accidentally perform expensive operations in software systems, because even a few orders of magnitude is small in the grand scheme of things.
I thought it was a neat observation, but I didn’t think more about it until I was reading Mastering Perl1 Mastering Perl; Foy; O’Reilly Media; 2014.. In that book, Foy briefly discusses a general approach to primitive profiling2 For every high-level operation in our code, we can increment a counter in a package-local dictionary, and then dump all operation–counter pairs to a file on program exit. I have used this technique before, and it’s kind of nice for quick performance analyses.. He goes on to say:
The most common place I do this is in database code. In the database code, I want to track which queries I make, usually so I can get an idea of which queries take a long time or which ones I most frequently use. From that, I can figure out what I should optimize.
This made me stop to think for a moment – sure, database code is a good candidate for this, but how do I recognise other good candidates? What do they have in common? Then I remembered that Hacker News comment, and it took on a new meaning for me.
On the scale of L1 cache to remote call to another region, database code almost always sits on the high end. It often implies sprinkling data over a (virtual or real) physical link to a different machine, or at least calling out to a different process. Even when using SQLite, it tends to imply several file system operations to guarantee durability. Let’s say it’s a \(10^9\) operation in terms of orders of magnitude.
If we can delete a single unnecessary \(10^9\) operation, that pays for a lot of unnecessary \(10^5\) order operations. If optimisation is the process of taking out unnecessary work3 Find inspiration in the classic lean wastes: overproduction, waiting, transportation, motion, overprocessing, etc., then we should try very hard to take out an unnecessary \(10^9\) operation before we even start considering \(10^5\) operations.
So yeah, that’s how you recognise good targets for custom high-level profiling. Look at the high order-of-magnitude stuff. Don’t worry about the rest for now.
This sounds so obvious, yet I don’t know how many times I’ve gone, “But I have a hunch there could be some optimisation possible in this tight loop…”
Yeah, sure. There might very well be. But dollar-for-dollar, I’d still probably get more out of trying to avoid the high-order-of-magnitude operations first, like network API calls, database calls, filesystem operations, message passing, parsing, etc.