Pandas-Munkres: Part 1
Munkres is an algorithm for solving the assignment problem. I’ve talked about the problem before, so this blog is going to focus on why I’m trying to solve it. It’s a slice of personal history and will hopefully demonstrate who I am, and the kind of person I am, and why I’ve fixated on this for so long.
In 2015 I joined the Civil Service Fast Stream. During the induction we were presented with a sheet of theoretical Fast Streamers and their preferences for their next move. We also got a second sheet: roles that we could match each Fast Streamer with. There were only ten on each sheet.
Avid readers will know what I then did not: that this problem has 3,628,800 potential solutions. We struggled with it for some time. In fact, the thing we argued about most was how we should decide. What’s more important? Location? The skills the Fast Streamer wants to learn, or the skills the job demands?
Someone suggested a formula; someone else pointed out that we’d need to do 100 calculations for this grid and we had ten minutes left. All the time the facilitators grinned, because part of the point of the exercise was certainly to point out that this is a very difficult thing to do.
Most people, I suspect, forgot that exercise the minute they finished it. Four years later I’m still dwelling on it.
I couldn’t shake the feeling that there could be an easier way. That we surely couldn’t be the only people in the world who had this problem. That the fact that it was difficult meant that we should be looking at how we could make a computer do it.
I went off to my first department. I was placed in an Analytical function, and set to work. I got extremely good at Google Sheets. I briefed senior stakeholders. I automated as much of my job away as possible with functions in Sheets. When I reached the limit of what that could do, I started looking for ‘ways to automate things’ and got started with Python. With Google, the absolute knowledge that I couldn’t fail[mfn]privilege does funny things to your brain[/mfn], and a problem to solve I set about hacking together solutions to automate the rest of my job.
I learned enough Python to be dangerous, and so about halfway through that role, I began the first iteration of code to solve my assignment problem. A snapshot of that code follows.
It is bad code. There’s this double-indented
for loop that also has ten
for list in candidatesAll: candidate = Candidate(*list) deptMatch =  for list in postingsAll: posting = Posting(*list) score = 0.0 if posting.department == candidate.wantedDept1: score += 1 elif posting.department == candidate.wantedDept2: score += .9 elif posting.department == candidate.wantedDept3: score += .8 elif posting.department == candidate.wantedDept4: score += .7 elif posting.department == candidate.wantedDept5: score += .6 elif posting.department == candidate.wantedDept6: score += .5 elif posting.department == candidate.wantedDept7: score += .4 elif posting.department == candidate.wantedDept8: score += .3 elif posting.department == candidate.wantedDept9: score += .2 elif posting.department == candidate.wantedDept10: score += .1 else: score += 0 print(score) deptMatch.append(score) deptMatrix.append(deptMatch)
It’s a strange thing to have always written in the open. I can open my own history and peer inside. I can see where I came from. I can see the point at which I discovered classes and see them bloom across the codebase, capturing functions and neatening the whole thing out, adding structure, making it more readable.
I can see the things that in the future I’ll learn from.
I spent a month on that attempt. It was shot down quickly: the interface was accessed from the command line. You had to open up a terminal screen and run a script, and the HR people I was pitching this to were rightly very suspicious of it. I’d only been on the DDaT Fast Stream less than six months, and already I’d learned that technology should augment existing user journeys if you want it to be adopted. Beyond that you’re looking at massive business change, and disruption, and that way is not going to end well for a wet-behind-the-ears graduate whose development experience is “I hack at it until it works.”
So I put a pin in it. I didn’t stop thinking about it, though.
See, when I think about something – when I’m sure I’ve found the answer that fits, the one thing that I know should be right, then I fixate on it. I don’t know why. I do know that I’ve not yet been wrong.[mfn]At least I don’t remember being wrong, but given the human bias to forget things that don’t align with our worldview I (and you!) should take that assertion with a pinch of salt[/mfn]
For the rest of 2016 I worked on the Fast Stream Conference, and I wrote an algorithm to dole out tickets so that they were approximately reflective of the society we service. That’s some pretty bad code too, but it was better than the first attempt that had five people round a table swapping bits of paper and keeping a running count of protected characteristics.
But it proves that I’m still trying to get computers to automate counting problems. I’m still thinking about the same old problem. How do we effectively assign people to roles, and do it in a way that people will actually use?
In early 2017 I started again and went hard in the opposite direction. In my youthful ignorance, I decided I would simply build a complete solution, including graphs showing progress against the average and high-level management information, and worry about the processing later.
I learned how to use Flask, a Python web framework. I learned how to interact with a database using an ORM. I learned that migrations are really, really important – and I learned that the hard way. I learned what it is to be in a corner and have no choice but to destroy two days of work with nobody to blame but yourself.
I learned not to take the work too seriously: to commit early, to commit often, to try things in branches. I became a little bit better again.
It never went anywhere. I had no idea how to host it; how to secure it; how to take constructive criticism. I put secret keys in a public repo and tried to pretend that it was standard practice. [mfn]It’s not, don’t do this, and don’t pretend you know better than the people around you[/mfn] I didn’t know what I was talking about, and rather than learn, I became staggeringly defensive.
The code was maturing faster than I was, though. Some of the ideas lodged better than others. Some of the feedback soaked through my skull.
In 2018, I joined GDS and I started again. I joined GDS as a developer. I’d never written code for an organisation this large before. I’d just taught myself, and it had taken me two years, because I was fixated on the problem. And this is why I’m confused when people ask me how to become a developer. I fell into this by accident. I’m just trying to solve a problem.
I got really good at writing code at the same time, so that’s my advice. Find a problem that you want to solve that you think computers can do, and then obsessively teach yourself. Stay up til 4am in a tiny flat in west London. Try to explain to your partner why you can’t sleep until you’ve solved this problem. Realise that the way your brain works has given you a highly valuable skillset but that the side effect is that you are really goddamn hard to love.
Reflect on whether this is just who you are now. Realise that there are other places to grow to. Reflect more. Grow sideways. Find support in the strength of those around you.
Anyway, my next iteration was really good. It was the best code I’d written to that point, which is about right, because if you’re always getting better then the best code you’ve ever written should be whatever you’re writing in the moment. It had a really nice interface. It flowed. It had quite good senior support.
And then senior people changed, and work got busy, and it just quietly sat there for a little bit longer going rotten. Code ages like fish.
Making change requires so many of these things – senior people willing to invest political capital, and the technology being right, and the the right knowledge being in the right place that I’m always moderately surprised that any change happens at all.
Then I remember that you only need gather 75 people to be almost certain that there will be a shared birthday and I feel better. I shouldn’t be surprised that change happens: I should be amazed that changes I want to happen happen.
Anyway. That iteration had a gorgeous frontend, but the computation was lacking. I’d assumed that my old code would still work, but coming back to it was – well, remember how code ages like fish?
It was like opening the hatch on a fishing boat that had been abandoned after a particularly successful trip a decade ago. A boat that literally dropped deeper into the water as the gases from the rotting flesh within escaped, accompanied by a wave of heat.
It was bad, is what I’m saying.
And so now I’m here. I’m trying to solve the same problem I’m always trying to solve, because I think the solution is worth pursuing. The world’s moved on. The people I knew when I started this are different. The technology has changed, and I’m still here, Sisyphus with a rock that I picked out specially and can’t let go of.
Because I think, deep in my ersatz professional heart, I’m afraid that Sisyphus is nothing without a rock.
November is National Blog Posting Month, or NaBloPoMo. I’ll be endeavouring to write one blog post per day in the month of November 2019 – some short and sweet, others long and boring.