Every now and then I come across a section of code that looks weird to me, and I have to spend an unnecessary amount of time to sort it out. This angers me a bit inside, every time. I don't like wasting my time on things my computer could do for me.
In the case that triggered this article the weird code I found was something along the lines of
if profile and profile.verified_email: last_interaction = profile.email.interactions() # some more code accessing various fields and methods of profile if profile.verified_email and last_interaction: # what!?
In case it's not clear already, I'll explain my confusion.
One of the basic tools when writing "proofs" (either in a very rigorous sense or in the more loose, intuitive sense) for computer programs is that inside an
if construct, whatever was the condition that led there can be assumed to be true – because there is no other way to get into that block of code!
So we can assume that inside the outer
if block, the
profile.verified_email field is true-ish (given that this is Python where a lot of things count as true in a conditional).
Despite this, the
profile.verified_email field gets re-tested inside the
if block where it is supposed to be true. There are three possible explanations for this, in order of likeliness:
- The programmer who wrote this code was tired/lazy/distracted/drunk. Very likely. They just didn't realise they were re-checking a variable that is already true.
profile.verified_emailis not actually a variable field, but rather a property, which means it can do all kinds of side-effects when you access it. Perhaps it queries a server and confirms verification each time it's accessed, and thus can change at any point?
- The code in between the two checks, the code that uses the
profileobject, might have a side effect that makes the email unverified even when it was verified previously.
So while explanation one is the most likely one, the other two can't be dismissed. And unfortunately, they lead to different actions on my part. If explanation one is correct, then I should remove the extra check from the code, because it's confusing and useless. If explanation two or three are correct I should leave it in.
Sooo… what do I have to do? Read lots of code. To begin with, I only have to read the implementation of
profile.verified_email to see whether or not it's a property. If it's not, I can quickly rule out explanation two. Then I have to read the code and the implementation of the methods used in between the two checks, to rule out any side effects causing a verified email to become unverified.
You might not think of it that way, but reading all that code is a huge time waste. There is a much better way of doing it. There is a way of doing it that means I can instantly know whether or not it's possible for
profile.verified_email to change between the two checks. If it's not possible (which was the more likely scenario) then I can immediately remove it from the second check. No code reading required.
What is this magic I speak of? Controlled side effects.
Quick recap: side effects are anything your code does that is not straight-up computing a value. So things like modifying object fields or global variables, reading a file, querying a server, generating and using random numbers… you get it.
Any time you get different results by reordering method calls or call a method twice, you're witnessing side effects. Any time you groan about having to set up state to unit test a method you're witnessing side effects. Any time you change one part of your program and another starts misbehaving you're witnessing side effects.
Any time you hear someone speak of isolation, separation of concerns, implementation hiding and low coupling, you're hearing them talk about minimising side effects. Side effects are the things that intertwine unrelated parts of your program.
Since side effects generally make it harder to write bug-free code we tend to avoid them when we can. The problem is that Python (and indeed most languages) give us no tools to figure out when side effects are happening.
Controlled Side Effects
Modern type systems allow you to annotate your values and methods with all kinds of things, which is great. Specifically, some type systems allow you to annotate your values and methods as "doing side effects". There are different kind of side effects which have different annotations, but let's just speak broadly about the concept.
Not only do these type systems allow you to annotate things as "doing side effects"; they specifically disallow doing side effects unless you have annotated the thing as such.
So what we have then is a bunch of methods that are annotated as doing side effects – these are "dangerous" territory. You have to be careful around these because they might do things you don't expect. But we also have even more methods that are not annotated as doing side effects. These methods are called pure. If the methods you're using are all pure, you know you can use regular proof techniques when trying to figure out stuff about your program.
In the example that opened this article, had all methods used between the checks of
profile.verified_email been marked as not doing side effects (as well as the
profile.verified_email field), I would have instantly known that
profile.verified_email has to be true throughout the outer
if block, and the previous programmer had just been tired/lazy/distracted/drunk.
No code reading required.
I like it when my computer does that kind of work for me so I don't have to. It's a shame very few languages give us those tools.
Sometimes when presented with "new" ideas (scare quotes because annotating side effects is a fairly old idea in the grand scheme of things – it's just not widely practised) on how to do things, people come up with all kinds of reasons to not do things the "new" way, because it's not the good old way they know. These are a few arguments I've heard against annotating things that do side effects:
"I've never had a problem with rogue side effects. You're just inventing problems for yourself."
Okay, congratulations. You are, and have worked with, much more disciplined programmers than me. Or maybe you enjoy reading unnecessary code more than I do.
"All this sounds unecessary. Sure, I understand side effects can be troublesome, but it sounds like a lot of trouble to annotate everything too!"
Keep in mind we are already trying to avoid side effects, which means in reality much fewer methods than you think will need annotation. If all your methods need to be annotated as doing side effects, perhaps your design is not optimal in terms of maintainability?
"What if the methods in your example were annotated as doing side effects? Then you'd have to read their implementation anyway."
Remember where I said there are different annotations for different kinds of side effects? If a method is annotated as having random number generation as a side effect, I really don't care. That's not going to affect the
profile.verified_emailfield at all. Only if the method is annotated as "has the side effect of changing fields in the object it belongs to" does the side effect annotation matter.
"But what if one of the methods in your example was annotated as modifying fields in the object it belongs to? Then you'd have to read the implementation anyway."
Correct. If that was the case, then yes. However, since always having to read the implementation is the default when you do not have controlled side effects, we're still making the situation better in some cases, just not that particular one.
And with annotations I'd still only have to read the methods that were annotated, not every method!
"I don't want the computer to tell me what to do. I am the programmer, I call the shots. If I say I want something, I don't want the computer to stop me."
Don't view it as the computer stopping you from doing something. View it as documenting the purpose of your method. Even if you don't need documentation right now, the next programmer might need it – that next programmer might be you a few months down the line. Attaching compiler-checked information to your methods is a fantastic way of documenting your code.
"I'd still feel restricted in my design decision by having to make pure methods pure. If I'm restricted I'll stay away from solutions that might be easier to implement, which is bad for the stakeholders."
This is a two-parter. One: Is it really bad for stakeholders? Sometimes the easiest solution isn't the best one in the long run.
Two: So annotate all your methods as doing side effects. By annotating all your methods by default you have exactly the same restrictions as in modern languages today, where every function can potentially do side effects. Then later in the design process, when you have locked down some methods that actually don't do side effects, you are free to remove that annotation and reap the benefits of having a known pure method.
"I don't care about code quality. I just want my code to be fast."
Good news! Pure functions open up a floodgate of optimisations that are not possible in the presence of side effects. Your compiler actively avoids a bunch of optimisations because it can't be sure they are safe in terms of side effects. Reordering method calls is one of those simple optimisations that just can't be done when you don't know which methods do side effects.
If you could tell your compiler which methods actually do side effects, it could put all the other pure methods in optimisation heaven where it can do almost anything it wants to them.