We broke something this week. Well, I broke something, if you go by code authorship. It was supposed to be a bug fix, and we did everything by the book - pair-programmed the fix, did a code review, got our testers to do some QA, even went through the don't-really-believe-in-it-but-pick-your-battles internal change control process with all of its additional approvals. But it still broke.
In other words, half a dozen or so people looked at this change and went, "that looks like a reasonable thing to do". Yet despite all these checks and double-checks, it still all went wrong. How? Because we were working on a Sierra Adventure Codebase.
For those who don't get an immediate hit of nostalgia on seeing that screenshot, the Sierra adventure games were a series of games from the '80s and early '90s, usually called something Quest, that were notorious for featuring many deaths and dead ends.
Those deaths and dead ends were often hideously unfair. That cave above requires you to have a specific set of items, and the only way to find this out is to get stuck in the cave and reload. Which was pretty much how you played Sierra adventures - lots of saves, lots of deaths, lots of reloading and trying again with the knowledge of how you'd died previously.
If you've ever read Ron Gilbert's classic essay on the subject you'll know the conclusion: narratively, this is horrible. The characters rely on all this ex machina knowledge that they could not have inferred from their environment. If you tried to tell the Kings Quest story as a novel, it would be full of passages like, "a flash of divine intervention told King Graham he would only be able to cross the bridge eight times".
So how does this relate to us breaking our service?
The bug we were fixing was in a method which sends a report to an in-house error logger. It does this by building a generic dictionary of strings, serialising that to JSON, then sending it to a REST endpoint. Sometimes this method was trying to dereference a null value, throwing an exception instead. We decided that since this thing was only building a dictionary of values, we could coalesce any nulls to an empty string, and everything would work fine.
Sounds reasonable? That's what we thought - right up until the error logger stopped being able to process messages. Turns out that some of the fields in that dictionary are not only required, but have a limited set of valid values. If you fail to supply one of these valid values, you end up stomping the entire input queue of said logger.
This is the Sierra Adventure Codebase at work. You're faced with a
Dictionary<string, string>. The existing code looks like it's trying to send an empty string, and getting caught in null dereferencing. Sending an empty string looks like a reasonable approach. The only way to know that you cannot do this is to have died before!
So we did something reasonable, and we died, and now we roll back and solve the puzzle correctly using the knowledge of our earlier death.
How can this be better? Well, in adventure games the Sierra era was followed by Lucasarts' imperial period; a run of classics including The Secret of Monkey Island, Indiana Jones and the Fate of Atlantis, Day of the Tentacle, Sam & Max Hit The Road... games designed to have zero unfair deaths.
How did they do that - or rather, how did they do that in a way which is applicable to code?
Firstly, by avoiding the use of Game Over as a feedback mechanism. We made a mistake - we sent an incorrect value to our error reporting endpoint. But that unhappy path should have been dealt with! Handling the case that someone sends us data we can't process is the equivalent of your adventure game character saying, "I'm not going in that cave. It looks dangerous," rather than blithely walking to their doom.
Secondly, make it trivial to infer things from the environment. Put up a sign telling the player the bridge only has eight uses left.
Dictionary<string, string> is a really bad thing to use in our example - because in a strongly-typed language, someone is going to assume, "I can put anything I like in this map". If there are restrictions, create a class which reflects those restrictions. Enforce the correct inputs at compile time.
(This is why it's a good idea to build client libraries for your APIs - not only can you enforce those constraints, but you can save people having to re-implement calling a REST endpoint in as many different ways as you have services.)
Getting away from the Sierra Adventure Codebase is about making sure you never have to "die" to find information about how things work. It shouldn't be possible to break things by taking well-meaning actions: you should always have something that tells you you're on the wrong path, whether it's having to bypass a client library, rewrite a type or ignore some red tests.