This is gonna get geeky
I have been weirdly obsessed with immutability recently. I’m going to do my best to explain why, and how I’ve solved it. Partly for me so I can re-read this in a year, but also partly because I keep seeing people not bothering and it’s bothering me.
I’m not going to explore why it’s bothering me. Y’all get tech or emotional labour, and right now it’s tech.
Write once, run everywhere. That was the promise of Sun Microsystems when they came up with the cross-platform Java Virtual Machine. That was way back in the 90s, and it promised an exciting new way to write code: code that deployed itself into a virtual machine and therefore could be run on any operating system
You can imagine how that went when I tell you that before long Java developers had repackaged that idea as “Write once, debug everywhere”. Still, we’ve had thirty years. Things must have got better.
No. Actually, I think in some places they’ve got worse. In particular, Docker. If I’m being particularly particular, Docker plus my absolute bugbear: a tag that just says ‘latest’.
When nerds like me talk about continuous integration and continuous deployment, we’re talking about two slightly different things:
- continuous integration is making sure that I don’t go off on a massive tangent: I have to combine my code back into the main branch, and get everyone else’s code off the main branch, at least once a day. It makes sure we’re all moving in the same direction
- continuous deployment means that code is deployed without human interaction into the production environment. No touching, no tweaking
One of the ways we should achieve this is through immutable builds. Immutable means ‘unchangeable’. Advances in technology over the last thirty years means we can now deploy the infrastructure to support our code, as well as our code. In the old days, we had to put code onto servers. Now we can code servers into existence, and tear them down when we don’t need them any more.
Into this idyllic world came two things that threaten the bucolic atmosphere:
docker build and
docker build is the command we use to turn a recipe of code and infrastructure into an artefact ready to be deployed into a production environment and make users happy.
:latest is the tag that is added automatically to each artefact before it’s sent on its merry way. Combined, they make immutable builds very tricky. Let’s start with why
:latest is a horrible idea:
Imagine you are putting stuff in a box, and you label every box with the name of your project:
and then at some point someone comes down and says it’s all gone wrong and you need to go back to how you were doing before it went wrong. At this point you’re in trouble because there’s no way of knowing which box marked ‘important code’ is the right one.
Tagging each box as you go with ‘latest’ does not help you at all.
And the same goes for tagging things ‘staging’ and ‘dev’. That doesn’t help either! In fact, that’s worse, because then I suspect you’re re-building the box between stages. And when your Dockerfile looks like this:
RUN apk add --update --no-cache tzdata && \ cp /usr/share/zoneinfo/Europe/London /etc/localtime && \ echo "Europe/London" > /etc/timezone RUN apk add --update --no-cache --virtual runtime-dependances \ yarn postgresql-contrib libpq less postgresql-dev shared-mime-info RUN apk add --update --no-cache --virtual build-dependances \ build-base && \ bundle install --jobs=4 && \ apk del build-dependances
…which, if I clean it up a bit, actually reads:
DOWNLOAD a bunch of random stuff from the internet, I dunno, whatever man ALSO here I'd just like whatever some random bloke has added to his repo overnight while he was way too tired DOWN HERE honestly buddy is there really that much difference between version 1 and version 2? Just lay it on me
I’m getting concerned that the immutable build object you’re supposedly passing along your deployment pipeline is actually three different objects that you’re just putting in the same clothes.
The solution is to tag your artefact with something you can come back to. I like to use the hash of the commit that you’re building, because then you can mostly associate a build artefact with the code used to create it. Plus, it makes it easy peasy to deploy, even if you’re still using pull requests and haven’t ascended to the heights of trunk based development.
When you make the PR, the head of that branch is used to create a build artefact. Let’s say the hash is
2f4e702, so you build
my-awesome-code:2f4e702 and deploy it to a registry. Your test suite should now pull that exact artefact. If it passes all of its tests then your code will be merged. At this point a new commit is created – let’s say
Obviously you don’t want that commit, which will have a different hash – you want one of its parents. You can access that through
HEAD^2 : the second parent of the current commit. Now you’ve got the commit that refers to the immutable object that you know has passed its tests. Grab it from your favourite container registry and you’re ready to deploy the immutable, known-good artefact into production.
Right now this is my best effort. Can you do better? Make improvements? Congratulations! You might have learned something. But more than that, you’ve made a friend for life in me.