Two Wrongs

Quality in Software Development, Part 1: Culture & Communication

Quality in Software Development, Part 1: Culture & Communication

I started writing this article from a few loose ideas, but the content kept growing. Instead of trying to write everything in silence and dump a huge article four months from now (which is how I normally do things), I will try something new: releasing it in small increments.

I hope I will be able to do four parts:

  1. This instalment, about culture and communication.
  2. Incidents and how to prevent them.
  3. Methods to improve quality.
  4. Statistics in software engineering is surprisingly hard.

References

Take a couple of books by Edwards Deming11 Out of the Crisis; Deming; mit; 1982., 22 The New Economics; Deming; mit; 1994.. Take the cast Handbook33 cast Handbook; Leveson; 2019. Available online (pdf).. Take Donatella Meadows’ leverage points44 Leverage Points: Places to Intervene in a System; Meadows; 2010. Available online.. Take Gil Tene talking about latency55 How not to Measure Latency; Tene; 2015. Available online (video).. Take Nassim Taleb’s information on fat tailed distributions66 The Fat Tails Statistical Project; Taleb; 2018. Available online.. Take the dod’s human engineering criteria77 mil-std-1472f: Design Criteria Standard: Human Engineering; Department of Defense; 1998..

Essentially, take little bit of quality management, accident analysis, systems theory, reliability measurement, and things that come along with that, and throw it all into a running jet engine. Collect whatever comes out the other side.

What you get is this article. Except I’m the jet engine, and what came out the other side are my arbitrary musings on the topic.

You might read this article and go, “well, duh.” You’d be right. These are not original thoughts, and the subject is not rocket surgery. Yet we keep forgetting what’s important, so a reminder from time to time is helpful.

Much of this has empirical value (for references, see above) but it skirts close to philosophy and politics at times. If you are a hardcore skeptic, and you need everything to be backed up by solid evidence, there is still one thought I want you to walk away with:

Do code review. Not doing code review is plain stupid.88 Best Kept Secrets of Peer Code Review; Cohen; Printing Systems; 2006., 99 Making Software: What Really Works, and Why We Believe It; Gram & Wilson; O’Reilly; 2011., 1010 Code Complete; McConnell; Microsoft Press; 2004.

Quality Starts With Culture

The first step towards quality in software development is a healthy culture surrounding it. This applies whether or not you’re alone, whether or not it’s an open-source project with arbitrary internet people as main contributors, or if you’re working with a team in a professional capacity. In fact, everything I write about here applies to any sort of software project, and I have tried to be inclusive in the way I write. (For example, I will try to write “organisation” rather than “company”.)

What is a healthy culture? This is individual, but there are some general aspects that can be divided into two bigger categories:

  • Enjoyment and psychological safety; and
  • Scheduling and planning.

Psychological Safety

I’m not going to beat this horse because it looks fairly dead. Google tried to find out (not too long ago) what separates the effective teams from the less effective ones. They were planning to look at things such as team role composition, experience of team members, and so on.

What they found was that the single best predictor of productivity and success in a team, was something called psychological safety.

If you feel safe from judgment and resentment, you are more likely to enjoy what you do, you are more likely to cooperate with your peers, and you are willing to experiment more.1111 Well, duh. You are even physically able to use more of your brain power to solve abstract problems. When we feel unsafe, the parts of our brain used for abstract thinking shut down in favour of more primitive urges. That is not a great state of mind for doing creative, problem-solving work.

This is why it’s so important to avoid including any assholes in your project; a single asshole can shut down the psychological safety of the entire team.

It’s also why it’s so important to be nice. To assume people are doing things with good intentions. To approach every situation with an open mind. To admit to yourself that most of the time, you are wrong. You may be less wrong than everyone else, but you’re still wrong.1212 Historically, humans haven’t been that great at being right.

Planning

Another thing that ruins abstract, creative thought is stress, and crunching to meet a deadline.

There’s a disagreement in the software development community regarding whether or not it is meaningful to estimate the time required to complete a task. As always, the truth is probably somewhere in between.

Critically, estimates are not deadlines. You should not have deadlines, especially not in a creative process.

You have to allow work to take the time it takes, because that is the time it will take. Work always takes exactly the time it takes.1313 Well, duh. You cannot wish work into going faster. If it were that simple, then everybody would just wish for their day’s work to be done by 10 am, and then go home.

Of course, in the real world, there could be any number of valid business reasons that we want to make a release before a specific date, for example. How do we do that?

If we can’t control how fast work is happening, how can we make sure the work is done by a specific date?

We instead vary the amount of work to be done. It’s that simple. And that is where estimates are valuable. How can you possibly know how much work there is left to do, if you don’t even know that it will take twice as long to reticulate splines as it will to insert the chaos generator?

(This is also a key insight to understanding estimates: they’re relative to each other. Never absolute. They are not man-hours, they are not hours, and they are definitely not wall-clock hours.1414 The difference between hours and wall-clock hours is that one is specifying a number of uninterrupted hours actively spent on a thing, while the other is specifying a fixed time in the future. They are just a way to be able to ask the Question: “We can either do this big task, or these two smaller ones. What is more important?”)

Another way to view it is this: Planning does not mean deciding which things should be done by when; planning means predicting which things will be done by when, and re-prioritising based on that prediction.

Communication Begets Productivity

There are three aspects of communication that I want to cover:

  • Being aware of the context or system,
  • Staying informed, and
  • Interruptions.

I think these are things we don’t talk about enough, perhaps because communication almost by definition becomes hard to measure when it’s dysfunctional.

Context

Software development always happens in a greater context, in a system. Not only are there (often) downstream users of your code, but there are also upstream vendors of the dependencies you have.

We frequently ignore this whole-system picture, simply because it is more convenient to treat the software development process as a black box. Requirements go in, and software comes out.

To produce quality at lower cost, this is insufficient. Context matters.

When producing software, we make tradeoffs and decisions at every step of the way. Software in a radiation therapy machine will require different tradeoffs than an aircraft control system, both of which require a different set of tradeoffs than a coffee maker.1515 Well, duh.

These small variations cannot realistically be captured in a specifications document. Software developers simply must know what the software is used for, in order to make intelligent decisions and tradeoffs while they work.

This means everyone involved must understand

  • What they are doing,
  • Why they are doing it, and
  • For whom.

One way to visualise this is throw a flow diagram. A flow diagram generally says more about how an organisation works than any sort of hierarchical pyramid of who reports to who.

The flow diagram illustrates how resources flow through the system, where value is added, and eventually terminates at the end users. Very importantly, it also shows how information back-propagates through the system, along which paths user or qa feedback reaches the earlier stages.1616 If it does, of course. If it doesn’t, the flow diagram shows that you have serious problems in your process and that feedback isn’t making its way back to the earlier stages.

Sorry, your browser does not support SVG.

This can be a very rough sketch, but it should allow you to answer questions like, Who depends on my work? Whose work am I dependent on? These questions matter not because the answers themselves are very surprising, but because they open up for asking new questions:

  • Would skipping a middle man make things more efficient?
  • Do I trust this upstream vendor? (E.g. a library author.)
  • What is my agreement with this service provider? (E.g. an isp.)

Staying Informed

There are many techniques for ensuring you stay informed, but there’s only one cardinal rule that cannot be broken. This is to never punish messengers of bad news. We are humans, and we naturally get disappointed when we get bad news. It’s easy to unintentionally punish someone delivering bad news, despite your best efforts not to.

Unfortunately, the result of shooting the proverbial messenger is fear. Management by fear is a great strategy if you want people to hide the most important information they possess. The most valuable information someone has is generally going to be the bad news they haven’t told you yet.1717 Well, duh.

Your entire team and organisation should reward people for bringing you bad news. Always encourage sharing any information that grounds you in reality.

Everyone becomes less productive when people withhold and sit on information that is useful to others. However, keep in mind that when people do this, it is rarely out of spite, but because the system has failed them before.

Interruptions

Relating to the above about withholding information: Communicate!

An engineer is only as good as the system she works in. A system is only as good as the interactions between its parts.1818 Well, duh. Good communication and collaboration both within a team, and with other teams1919 This is frequently forgotten. Your team must communicate with other teams just as well as you communicate internally within the team., is crucial to acheive productivity and quality.

Engineers often sigh loudly about how interruptions are bad for productivity. This is true for personal productivity. On the other hand, the right interruptions can really boost the producitivy of your organisation. If Alice is stuck on a problem Bob knows how to solve, and Charlie depends on Alice getting on with things, interrupting Bob to unblock Alice can be worth it many times over.

Personal productivity is easy to measure, and organisational productivity is hard to measure, so it’s easy to err on the side of personal productivity at great costs to the rest of the organisation.

This is compounded by the way organisations try to rate or assign value to their contributors one-by-one, rather than for their team contribution. This means the system encourages prioritising personal productivity over organisational productivity. Someone who spends all their time getting other people unblocked might very well be the greatest facilitator of productivity in the organisation but will still be regarded as the one who never gets anything done. How sad is that?

Next: Incidents

Until I have written the next part, there is nothing more to see here.