Something I rediscover every week is that policy works in slower time. I am finding it a real struggle to wean myself off the instant serotonin hit you get from releasing things minute-by-minute. Luckily, I know this isn’t unique to me: Camille Fournier’s book The Manager’s Path prepared me for this realisation.
It doesn’t make it less frustrating, but it does make it easier to manage.
This week I’ve been working on something open source. It’s been really interesting, because I’ve had to reconfigure how I imagine things. Instead of long-lived servers, I have to think about short-lived processes and actions. In short, I’ve had to start thinking about code outputs in terms of lean manufacturing instead of – I don’t know. What’s the metaphorically equivalent work process for servers?
Developing this architecture means leaning quite heavily on queues. Work that has to be processed is enqueued, rather than processed immediately. This in turn generates race conditions and weirdness around databases. To help me work through these things, I have naturally turned to Factorio:
Factorio is a fascinating and frustrating game that requires the user to build a factory. I start the game, like most people (I suspect), by having my avatar run around multi-tasking. Go over there and mine some coal, carry it to a furnace, go over there to grab some copper ore, come back to the furnace, and so on. Before long you’ve got iron. With iron you can build conveyor belts, and automated miners. Conveyor belts are queues. You build little robots to automatically add things to queues, or offload them at the end.
Before long your queues are full, and things can’t be loaded on any more. Everything slows down. So now you’ve got to have some way of communicating to the on-loading robots to stop at a certain point, which introduces a problem of latency. With this distributed system of switches, where multiple items can write state to a data store, how do you make sure requests are processed in the right order?
The answer, of course, is more queues.
Which…brings us back to the original problem.
Thinking about distributed computing, and consequently distributed systems in general, is interesting in the context of my organisation both macro and micro. At 420,000 people, more or less, distributed all over the place, the mechanisms for moving context and information – products that small areas of the factory are producing – have to be fast. There have to be meta-signals around when products are ready, to signal other areas of the factory to start preparing. Broadcast meta-signals show intent: I intend to deliver this resource. I intend to take this course of action. I am delayed.
The flip side of the broadcast of meta-signals is the filtering and receiving. Receiving and processing every signal is impossible, but ignoring all signals will surely be disastrous. Furthermore, while a factory only has hard signals – a resource is delivered, or not – human distributed systems have very soft signals: I am considering doing this. I would like you to say yes to this. I have set out on a journey, and I am alone, and I am scared, and I seek reassurance that all will be well. These are not signals that are easily transmitted, and more difficult still is it to show receipt. Simple instructions we can do: the list of procedure words (prowords) on Wikipedia is not short. These are meta signals and, to my great delight, I’ve noticed the more military of colleagues using them on video calls to signal intent – each message is ended with a clear OVER, signalling to the group that the caller has nothing more to say.
(One imagines a military breakup to be quite confusing: – “We’re over” – “Say again all after we’re, over“)
The way around this at the small scale is by having the whole team together. The cost of moving information two feet across a desk is negligible, and can be done in totally unstructured ways. By contrast, when the team is distributed, moving information is more expensive. You have to structure it – you can’t rely on the firehose of unfiltered information any more. This is an additional cost: it is the additional cost of complexity. A small team of 14 can sit together and understand what’s going on all the time. A team of 420,000 people cannot possibly sit together: the largest stadium in the world only holds 130,000 people. And anyway, imagine having a bake off. It would be absurd.
But also: a team of 14 might still not be able to sit together. In the case of a global pandemic, for example. In this case, you’ve got to radically rethink how you communicate. You’ve got to signal intent, but you’ve also got to accept that more of your brain-time has to be focussed on sending, receiving, and processing signals. You can’t rely on local context or being able to respond quickly any more, and you can’t rely on other messages. Suppose, in a factory somewhere, a machine was producing too much molten steel. If you were face-to-face with it, you might receive many signals: sirens wailing, lights flashing, molten steel splashing around what used to be your ankles, the gentle tickle of heat on your face.
However, if you were many miles away in the control room, one light has lit up. Fine if your receiving space – your inbox – is nice and empty, but it probably isn’t. It’s probably absolutely full of other signals. It may, therefore, be lost in the noise. You have no other signals to go on.
This is an opportunity to plug one of my favourite talks ever: Who Destroyed Three Mile Island.
It is also a not-at-all subtle point that for you, working in a distributed team, the problems have gone from obvious and face-melty to just one more flashing light on the wall. Here is my challenge: how do you process signals at a distance? Let’s put aside the latency problem – the one where you find out about a problem but the signal has taken so long to get to you that your only options are full stop or accept your fate – and ask how we improve our signals. How can we receive, process, and broadcast better signals?
With any luck, by next week, I’ll have an answer. And you’ll know, because you’re signed up to the meta-signal of email notifications.