Finding ways for developers to effectively raise process and quality issues outside of their immediate team is a huge challenge. Often an issue will only make it into the light when it’s been a problem for days or weeks. And the complications only multiply when you’re dealing with large, distributed teams.
I’m talking about problems that bedevil us all, like:
- Definition of Done not having been met on a Sprint Backlog item
- Finding an unclear requirement
- A necessary resource not being available
- Realizing you need to take on an unacceptable ammount of technical debt to meet a deadline
- Discovering that a dependency has a critical CVE filed against it and there’s no fix yet
- Someone commits a performance-wrecking change
- Just not being able to get the damn thing to run…
The problem(s)
Failing to address issues as the occur only allows them to accumulate. In the world of software developments, a few hiccups can turn into terminal brain cancer pretty fast.
I’d rather not recall how many times I’ve overheard this conversation well after the train as gone off the tracks:
- A: Why am I only hearing about this now?
- B: Well I said something at the time.
- A: Well you didn’t say anything to me.
- B: Well whenever I do say something, no one listens.
- A: No, you!
- B: No, YOU!
Sprints only exacerbate this problem by building in artifical latency. People that see the problem tend to wait until the sprint retrospective to bring it up.
Introducing Andon
Missing issues as they are occuring is a huge, costly problem in manufacturing. The Toyota Production System originated the idea of Andon, a visual signaling system that allows everyone to effortlessly highlight issues as they occur. Andon is, at its core, a way of weaving quality into the fabric of a product.
Team members are encouraged to pull a red or yellow chord whenever they see a problem, depending on if the issue is blocking or not. Pulling the yellow chord triggers a troubleshooter to come help with the issue; red shuts down the production line and resets the entire organization’s priority to immedately clearing the blocker. Yellow chord pulls keep quality problems from being propogated down stream where they can accumulate into a red chord pull. Thus Andon focuses teams on fire prevention, not fire fighting. Quality and efficiency become a reflex.
Applying it to software development
So what does this have to do with development teams? Well if you have a CI system that stops on build or test errors and flags the failing job, you’re already using Andon. Quality problems are visually indicated and actively prevented from moving on downstream. I’m here to argue that we as developers should go “full Andon.”
Replace that dashboard full of metrics no one pays attention to with an Andon board. Give each team its own Andon light on the board. Give each member of the team access to the “chord.” Designate a lead for each team who’s priority it is to jump in and help resolve or escalate issues as soon as they are found. Then marvel as newly empowered team members start taking quality issues seriously.
How do we assure that people actually take yellow signals seriously? Won’t we just learn to ignore them and make a joke of the whole system? As developers we love automation, so automate it. Yellow pulls should go red after a set time. Shut down the line by sending an automated Slack message asking everyone on the team to swarm the issue and get things moving again.
Getting better every day
In Andon, every chord pull is treated as an opportunity for continuous improvement. When an issue is found, the process itself should be patched, either through automation or documentation. This brand of really continuous improvement will evolve your process much faster than addressing a couple of issues after each Sprint retrospective.
Take the challenge to prioritize quality and give Andon a try. Let me know how it goes. :)