|When a situation goes wrong, |
look at your process and ask why.
by Terry Smith
Lean Enterprise Institute
Lean isn’t only a mindset. It gives you and your organization tons of tools and techniques you can apply immediately to improve your business processes. One of my favorites is the 5 Whys. It’s the best demonstration I think of how in fact, people don’t fail, processes do.
In short, 5 Whys is a quick root-cause analysis. Sakichi Toyoda invented this approach in the Toyota Motor Corporation to analyze the defects in processes.
Here’s one way to use it. As soon as an issue happens, you gather a 5 Why-meeting with everyone who participated in the situation and the host asks five times, “Why did it happen?” The host takes down all the answers while the whole team brainstorms an issue. It lets to evaluate the problem from all angles and restore the full chain of events.
Case #1: Freemake
Freemake is a multimedia startup. Their apps stream YouTube videos. But in February 2015, YouTube went down and Freemake displayed an error instead of streaming. So, they went through a 5 Why session:
1. Q: Why did Freemake software show an error?
A: Because YouTube content wasn’t available.
2. Q: Why wasn’t YouTube streaming available?
A: Because YouTube went down and we had nothing to display.
3. Q: Why did Freemake show nothing?
A: Because we were not aware of the YouTube disaster and we showed a default message.
4. Q: Why didn’t we know of YouTube's outage?
A: Because we don’t have an alarm notification system
5. Q: Why didn’t we have an alarm notification about streaming?
A: Because we don’t have quality assurance testing for emergency situations.
The team answers are neutral and clear. They don’t show any wishful thinking like “Because we had to think twice.” There is no hatred or blame. Take a look at the fifth answer and you clearly see a process failure. The Freemake quality assurance process failed and caused a poor experience for their customers. One solution? Develop a real-time tracking system and more user-friendly alerts about possible streaming problems. Victoria Kushner, Senior Marketing Manager from Freemake, says: “It was easy to blame our developer’s team, but our company reputation suffered and we realized the importance of a systematic fix.”
Case #2: Buffer
Now let’s look at Buffer, a social media management service company. In 2014 the service had a brief system-wide outage. Here’s how they ran the 5 Whys according to their own words:
1. Q: Why did we go down?
A: Because the database became locked.
2. Q: Why did it become locked?
A: Because there were too many database writes.
3. Q: Why were we doing too many database writes?
A: Because this wasn't foreseen and it wasn't tested.
4. Q: Why wasn't the change tested?
A: Because we don't have a development process set up for when we should test changes.
5. Q: Why don't we have a development process for when to do a load test?
A: We've never done too much load testing and are hitting new levels of scale.
Next, team members worked out countermeasures. Team members subsequently took responsibility for implementing those countermeasures and new tasks. Notice, each answer was related to a process issue. There were no answers like: “Developer X didn’t do Task Y.” The team didn’t let branching happen and instead kept the focus on figuring out the root cause. And the last answer pointed to a high level issue, i.e. the absence of critical testing of large scale systems, which caused a chain of failures that affected the company’s clients and business.
What’s common across both cases? In both cases, team members worked out effective solutions to fix broken or nonexistent processes. Simple and transparent question-and-answer meetings replaced complicated "investigations." Applying this technique, of course, has its limits. You have to watch out for all-too-common answers like "We didn’t have enough time/resources" or "We should have thought twice." And sometimes team members will jump from one level of answers to another. But the practice of going through the 5 Whys will still get you closer to the real root of the problem. It means that you start improving your business faster and avoid witch-hunting.
Terry Smith is a freelance web developer. He dreams to improve freelance practices with lean management tools. Terry is into technology and education. Terry contributes to StackOverflow community and shares his insights on distance work. Follow Terry Smith on Twitter https://twitter.com/TerrySmith226.