Data Consistency Is Overrated

kqr

, published 2023-02-18

Tags:

As I watch computer systems become ever more complicated by the day, I get an unnerving suspicion the idea of data consistency is a temporary anomaly of our time, given that our current systems are just small enough to make it possible yet we’re still naive enough to make it seem desirable.

Inconsistency Between Aggregates

At one point I was leisurely looking into object design from a domain-driven design perspective1¹ Effective Aggregate Design, Part II: Making Aggregates Work Together; Vernon; 2011. Available online.. This bit was particularly thought-provoking:

Thus, if executing a command on one aggregate instance requires that additional business rules execute on one or more other aggregates, use eventual consistency. Accepting that all aggregate instances in a large-scale, high-traffic enterprise are never completely consistent helps us accept that eventual consistency also makes sense in the smaller scale where just a few instances are involved.

Ask the domain experts if they could tolerate some time delay between the modification of one instance and the others involved. Domain experts are sometimes far more comfortable with the idea of delayed consistency than are developers. They are aware of realistic delays that occur all the time in their business, whereas developers are usually indoctrinated with an atomic change mentality. Domain experts often remember the days prior to computer automation of their business operations, when various kinds of delays occurred all the time and consistency was never immediate. Thus, domain experts are often willing to allow for reasonable delays—a generous number of seconds, minutes, hours, or even days—before consistency occurs.

It pains me to imagine a system without data consistency2² Because let’s not kid ourselves here: “delayed consistency” or “eventual consistency” are just synonyms for “no consistency”. Once you throw out atomic consistency guarantees, there are no guarantees left., but I also can’t help but agree with Vernon. Consistency has its strengths, for sure, but also plenty of weaknesses.

Maybe gradual deterioration of data consistency is part of the natural state of things, as long as you also allow for some degeneracy, i.e. let things be represented in multiple different, complementary ways. This would give a roughly correct picture even in the face of some inconsistency.

Financial Reports Are (Slightly) Inconsistent

A few months later I was reading about data consistency within financial institutions3³ Red-Blooded Risk; Brown; Wiley; 2011., and encountered this illustrative example4⁴ Brown’s point was not about object design, of course. He wanted to emphasise the importance of not trusting a number just because it appears in an official report. In order to trust a number, we must first ask what objective measurement in reality it has been verified against. Otherwise it’s likely to be pure bollocks..

But small errors are common, and they can add up to large totals. I traded nongovernmental mortgage securities in the 1980s, and I remember no deal in which all the cash flows were properly allocated to all investors. […]

These errors compound in large, complicated systems. The only reason these numbers don’t become completely useless is that there is some reality checking. For example, a firm’s accounting system computes how much cash it is supposed to have. This number will not be correct, due to errors in the firm’s systems. […] Obviously, the totals will not match, except by extreme coincidence. In a big firm, the difference will typically be millions of dollars. But the firm will not allow the difference to expand indefinitely. It has people constantly trying to reconcile things.

Suppose the firm overall is missing $5 million of cash. Its reconciliation people will naturally focus on finding that cash—that is, correcting the mistakes in the firm that cause it to overstate its cash balances […] These selective corrections will make the discrepancy shrink, but on average they will not make either total more accurate. The cash-in-bank-accounts number the firm reports on its official financial statement will have evolved to be within tolerable agreement with its bank statements. It cannot be wildly inconsistent with reality, but it will not be in any sense accurate.

What’s true of cash in bank accounts is even more true of pretty much every large, complex data system. […] There are enough input and processing errors that the results would be meaningless except that the outputs are checked against some kind of reality. The output you see appears to be the result of aggregating inputs, but it’s not. It’s really a product of evolution, of selective error correction until the result is within bounds acceptable of the system owner.

I’m a big fan of writing things with atomic consistency, invariant validation at every step, and failing fast if things seem off. Yet here we learn that some of the most important parts of our economy are essentially a matter of trial and adjustment until the numbers sorta-kinda look right when you squint.

I suspect I need to get comfortable with this. It may not make sense to maintain consistency in large systems, and perhaps it becomes uneconomical in smaller systems than I would like.

Lower Your Pitchforks

I’m not saying we should throw consistency out the window. I don’t even know where I stand on consistency myself. But I do think it’s overrated, and we’ll learn more about this as time passes.