Can you work with the Chaos Monkey?

At the Nethope Conference, one of the better plenary sessions was by Joe Baguley of VMware.  One of the things he mentioned in his talk that resonated for me was something that Netflix had developed called the Chaos Monkey.

The Chaos Monkey is a programme that Netflix run on their systems that randomly shuts down processes and services.  The idea is that the world is a chaotic place, and at some point one of your processes or services will shut down.  The chaos monkey simulates this, forcing everyone to design systems that can handle this or that part failing.

This seems to be a particularly important concept to grasp, particularly when building on platforms that market themselves as extremely resiliant.

At Christian Aid, I don’t think we need to build our own chaos monkeys. In our international environment, we are frequently interrupted by chaotic events, from giant signs falling on VSAT dishes (Abuja, 2008) to seemingly random VPN outages caused by ISP config errors (Port Au Prince, Dhaka, Delhi, La Paz, all to often recently).  Whilst these are a proper pain in the derrier, we must learn from them, and take this learning to build more resilant infrastructure, but also organisational processes that can handle everything from Earthquakes to SAN failure taking out our email system

The Chaos Monkey teaches us to expect the unexpected.

Read more:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s