Database changes while maintaining content

Mmm, that’s a nice title. Sure to draw the punters in.

I’m building something for some colleagues in my organisation, and I’m doing my best to do it properly. This is despite feeling, consistently, like I’m just playing at this: one of the downsides of learning a skill through sheer mucking around.

In any case: database migrations have always been a problem, largely because I’ve been bad at thoughtful, considered development and good at throw-things-delete-things-copy-off-StackOverflow development. This is changing, thanks in no small part to working alongside really fantastically thoughtful, considered colleagues.

I’ve been reading up on database migrations and came across a blog by Philip Potter who, coincidentally, also works in my organisation. He talks about keeping database changes separate from code changes, and I have to say I agree with his reasoning. That was the approach I took until I tried changing a database column name, and all heck broke loose.[mfn]By which I mean my tests didn’t work, I couldn’t merge the code, and the system carried on working as it had previously. Tests are good.[/mfn]

I was planning to change a database field from personal_email to email_address. I had my migration all neatly set up, I raised the pull request, and then my tests failed. This was an unhappy time.

A test report for pytest showing a number of errors

Because I’m using an Object Relational Mapper (ORM), I can access the internals of database objects. In many cases, I was reaching in and grabbing the personal_email attribute off the object. Since the name of that attribute was being changed, the tests that looked for that attribute were failing hard.

At this point, I could have just bundled up the code change and the database change into one commit. However, I was determined to stay true to my colleague’s advice and turned the problem over in my mind. I’d been mentoring for an hour beforehand and walked back to the station through St James’ Park, which for someone like me who suffers from hay fever[mfn]an allergy that would surely have seen me die from a tiger I couldn’t see, much less have the breath to run away from, in an earlier age[/mfn] was a poor decision. Nonetheless, as I came home I figured out a solution: three separate chantges.

  1. add a new column, email_address
    1. copy the data from personal_email to email_address
  2. change the application code to use email_address instead of personal_email
  3. delete the old personal_email column

Database migrations need contain both an upgrade and a downgrade – how you’d reverse these changes if you had to undo them. That means that the first change had to have an upgrade that added the new column and copied data to it while the downgrade would just delete the new column. Similarly the third change would have an upgrade that only deleted the old column, while the downgrade would recreate the old column and copy data into it from the old column.

The change in the middle was just a massive find and replace effort, which luckily is super easy.

In conclusion, I managed to keep my database changes atomic by shifting data around in my table. This was a fairly trivial example: with a more complex change you’d need to think far more carefully about how to make sure you weren’t dropping content all over the place. However, I stayed true to the principle and I think it worked fairly effectively.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s