phirate

Contains caffeine, guarana and traces of nuts

« Back to blog

Development at crisis speed

The foundations of modern web development are replete with stories of speed. We laugh at the tales of plodding, dinosaur-like waterfall models followed by corporates, we admire the diligence and focus inherent in the crushing furnace of start-ups.

On Feb 22, 2011 and in the days afterwards, I learned that there was in fact another development speed, one that made a start-up look like a leisure cruise in comparison.

On that day an earthquake rocked the city of Christchurch and I, along with many other web developers and designers, tried to figure out how to help. I joined the group that eventually became eq.org.nz, the largest online volunteer effort dedicated to finding and organising information about the quake for rescuers and victims alike.

We had some software to base off, but it needed extensive modifications. We had other problems for which we had no existing tools. Our major assets were a group of skilled, highly motivated individuals and all the services of the global internet.

We also had very little time. Every second that ticked by rendered information less useful, deprived us of volunteers and left those assisting and affected with fewer tools to help.

The team built software at crisis-speed by necessity but in hindsight it was an utterly alien place. Where a startup focuses limited development resource with razor efficiency, a crisis works with plentiful development muscle but minimal coordination.

It was the software equivalent of a shotgun. Loosely coupled individuals and teams built whatever they felt (or had been convinced) was best. Thousands of lines of code that would never see use were written, useful tools and changes were adopted, and those that missed the mark fell into disuse.

We built and rebuilt mapping tools, twitter filters, finders and matchers of every kind, groups coalesced to solve a particular problem then fell apart back to other tasks.

We hit problems. Many of the best practices for start-ups no longer served us well in such a chaotic environment. Having a large codebase tracked within github became a bottleneck as those with access struggled to maintain a dev/release pattern in the face of a barrage of changes from every direction. The thought of a weekly, or even daily production release cycle was laughable - we were "releasing" just as fast as we could.

Every shortcut that we could think of was taken. Test suites were ignored, debugging was performed on production, critical changes bypassed the version repo entirely in order to avoid the bottleneck, uncoupled tools were built and deployed on completely different hosts.

For some time it wasn't even clear to us that we were in a different place. It was difficult to recognise that we were ditching standard start-up patterns not because we were lazy but because they were fundamentally incompatible with what we were attempting to achieve.

As the development effort eventually wound down, I attempted to learn from our experience and a small number of core lessons became apparent:

1. Tools must be small and loosely coupled. You cannot have everything going through one big check/review process, individual tools need to be adaptable at their own pace, by the developers who are owning and making the changes, even if this results in duplication of code.

2. Communications between devs need to be crystal clear. Pull requests without context, local changes without comments etc waste valuable time while someone tries to figure out what is going on.

3. At least one person per shift needs to have a job that consists of almost nothing else but knowing what is going on. This person does not have time to be coding, they need to be the go-to guy for questions, and they need to maintain a log and perform hand-off after their shift.

These lessons are important for the development of software to be used in disasters. There is no value in creating large monolithic apps - when the crisis strikes the size of the app will prevent rapid adaption. An ecosystem of cooperating applications, clearly documented, secure by default and each versioned separately will win out on the day.

We also need to investigate methodologies appropriate to the crisis environment. How do we best retain reliability and security when speed is of the absolute essence? how can we take patches built in such an environment and return them to the upstream knowing the priorities that built them? what tools and processes do we need to help development teams self-organise when every second counts?

As software becomes the key to organising and streamlining relief efforts, the answers to these questions will become of critical importance.

Posted March 8, 2011
Mar 09, 2011
rjmackay said...
Really good post. I've heard hints of similar rushes of dev around other disasters too. It'd be great to feed our experiences back, and see what might be starting to emerge as best practice for this.
Mar 09, 2011
It has been fascinating watching the process as an interested lurker focussing on a different but related project (servalproject.org). You guys have done a fantastic job.

Interested in writing a paper on the experience, e.g., for Software Engineering Practice, when things settle down? Happy to help if you are interested.

Mar 09, 2011
Richard Clark said...
Hah this is the most writing I've done on any topic in some time, I don't think a proper paper is something I'd do. I'm happy to encourage others to do so though :)
Mar 10, 2011
JeremyHutchings said...
Not so much agile vs waterfall or any such nonsense - I'd think ....... I believe you are talking about self organise groups with a focus on communication and facilitation. This can not happen with *any* process where organisation beyond natural cooperation is "enforced" by the usual suits. Development like this happens when it's the doers that are doing - in Flow - not being interrupted and told to do something they've already thought about.

Though as with all ways - there is a time and place.

Good effort on this one :)

Mar 16, 2011
rediguana said...
Welcome to my world Richard :)

I've had a fair amount of experience in this area, not as a coder, but as someone that has worked with a lot of FOSS coders developing Sahana since mid-2005. Trust me, development during Response is the absolute worst time to be developing, but sometimes it has to be done. The problem with this approach though, is that it only produces a one-time solution, and it can be quite difficult to produce a robust and redeployable solution that has been developed during a crisis.

We learnt this with Sahana - Sri Lankan devs hacked together some solutions following the tsunami in late 2004, it met their needs, but was such a bespoke system it was nearly impossible to redeploy. We then obtained USD100k from the Swedish International Development Agency to rewrite it from the ground up. This resulted in good 'peacetime' development of the solution.

I'll dispute your point that loosely coupled will win out against an integrated solution. I'll bet money that in the long term, only an integrated solution (which is pretty much the only way to go to deal with access/security issues). There is much value to be create from a training and organisation buy-in perspective by have a single consistent UI and single sign-on that loosely coupled tools just can't compete with.

This is why I'm I Director of the Sahana Software Foundation, and we're keen to see most development of these solutions done before emergencies - not during them. We've done plenty of emergency development before (Haiti, Pakistan etc) and whilst we can make it work, it is less than ideal and often doesn't result in the most robust and generalised solution and we often have to go back and fix the hacks undertaken in the heat of the moment.

In terms of getting organisation buy-in to deploy systems, it will almost never happen during Response and Recovery, and it will need to be done pre-event during Readiness, and include testing, training and ideally exercises to ensure all users are familiar with the system.

If you're into Python - we'd love to have you contribute to Sahana Eden. I've got heaps of recommendations and ideas I've capture from being involved in response at the Art Gallery, and we are going to start implementing these in Sahana Eden for the next country that has to deal with these issues - hopefully they can use tools developed from our collective experiences.

A massive thanks for all your hard work over the past few weeks with eq.org.nz!

Cheers Gav

Mar 16, 2011
Steven Longmire liked this post.
Mar 16, 2011
schuback said...
Richard... Great post. Comments like this help the movement and progression in emergency management. It was great to work on this and hope to take the lessons learned for the next, after Japan, next time. Working on preparing for when it happens in my area. Cheers. p
Mar 16, 2011
Steven Longmire said...
Richard, thank you. Gavin, thank you for sharing your experience. In the last months, I have been evangelizing and connecting the dots of existing technologies from an interesting point of view with direct connection of emergency managers, law enforcement, and fire departments from around the world. The tools created, Sahana, Ushahidi, Swift, and crowdmap are all evolutionary. The difficulty I have had in and out of disaster response times, is requests to help me go back to the core of how to install these technologies, configure them, and present examples to new users that are not technical. The only way that I can do this, is to work directly with each of either the founders, or the lead technical individuals that I can follow the process with them from beginning to end on the install and config, and then see some examples of usage of the configured products. This is for the creation of tutorial courses in non-technical terms so the reach of these incredible technologies that save lives can be more prevalent and utilized by professionals and experts around the world that are non-technical people as their expertise is in carrying actions out and communicating through these tools with the public they serve. These training courses will be provided free as well, I simply need help from the incredible talents that created these technologies, and I will take it from there any evangelize to the world at levels to where even an elected official can use these tools. I ask for your help please. I can be reached at 425-329-3408 Skype ID: anti-terror and e-mail: steven@emconnection.org thank you, I hope this does not fall upon deaf ears, you all know what that is like. ~Steve
Mar 16, 2011
Richard Clark said...
While writing software during a crisis is challenging, I believe the benefits outweigh the negatives:

1. You know exactly what you need. You do not need to try and guess what the next disaster will look like.
2. You have access to developers in volume and with talent you could not possibly afford and motivation you just can't buy.

#1 is of particular importance. Each location and type of disaster lends itself to different communications vectors with different sources and targets.

The challenge, of course, is to mitigate the downsides of software construction in this fashion, and this is where the preparation comes in. You do not need to build features, you need to build a platform upon which those features can be adapted and delivered at speed and in parallel.

The rules I outlined in my article are aimed at such a platform, focusing on making the most use of a sudden availability of development resource.