Success through Failure in Software Development: Autonomation using Aspect-Oriented Poka-Yokes
A central problem in ... We show that ... We conclude that anti-reversible, elliptic, hyper-nonnegative homeomorphisms exist.
Unfortunately, the software products are shipped with very high error rates in comparison with other engineering artifacts.
The inevitable human mistakes, coupled with the steadily growing complexity of software, could explain the prevalence of mistakes in software, in no way the embarrassing number of these mistakes which are allowed to reach the customer, turning them into defects.
Few products of any type other than software are shipped with such high levels of errors.
The same errors appear again and again in different projects.
Failure to learn from mistakes has consistently been a major obstacle to improving IT project management. As Boddie wrote in 1987:
We talk about software engineering but reject one of the most basic engineering practices: identifying and learning from our mistakes. Errors made while building one system appear in the next one. What we need to remember is the attention given to failures in the more established branches of engineering. In software projects, as in bridge building, a successful effort can do little more than affirm that the tools and methods used were appropriate for the task. By the same token, failed projects need more than explanation or rationalization if they are to teach any lessons.
The main thesis of the book (To Engineer is Human: The Role of Failure in Successful Design) is that engineering failures are one of the best ways to learn how to improve and the mistakes of predecessors help improve in the future.
I believe that the concept of failure is central to understanding engineering, for engineering design has its first and foremost objective the obviation of failure.
The lessons learned from these disasters can do more to advance engineering knowledge than all the successful machines and structure in the world.
Petroski use the history of bridge failures to illustrate how we have learned, or neglected to learn, from failurs of engineering practice. The message is universal and reach well beyond engineering.
Given the faults of human nature, coupled with the complexity of the design of everything, from lectures to bridges, it behooves us to beware of the lure of success and listen to the lessons of failure. — Success Through Failure: The Paradox of Design.
Engineers learn from past mistakes and failures. It might even be postulated that engineers and the engineering profession have a duty or responsibility to do so, although this duty is not always spelled out in engineering codes of ethics.
The engineering community often considers the study of such a major collapse as an opportunity to correct and ipmrove on future design.
This is in stark contrast with accident investigations in the aeronautics industry. Open public investigations and voluntary sharing of information is part of the culture.
Construction and aircraft industries publish their mistakes so lessons can be learned.
There are innumerable sources reporting engineering disasters like Tahoma bridge collapse, etc. However, except few exceptions (Writing Solid Code), there are no equivalent references in the software industry. Neither the more popular and acclaimed books about the code of conduct for professional programmers (The Clean Coder: A code of conduct for professional programmers, The Pragmatic Programmer: from journeyman to master) refer a word about the practice of learning from failure.
How are you or your team managing the failures?
The focus is in fixing the bugs as soon as possible, not in how to improve the software development process that allowed the bug progress through to production.
In the best cases: write emails to the development team or write defects catalogues in a Wiki.
Can software defects be prevented by simply logging them into some “defect tracking tool”, documenting them and providing fixes for them? How we can share lessons learned to avoid future defects?
Currently, software failures are an embarrassing subject, but we should learn from other engineering disciplines and used them as an opportunity to improve our future designs.
Idea: DRYF = Don’t Repeat Your Failures
(Jidoka in Software Development)
Lean management is based on two concepts: the elimination of “Muda”, the waste, from the production process, and “Jidoka”, the introduction of quality inside the production process and product.
In software production, the elimination of Muda received significant attention, while Jidoka has not yet been fully exploited.
For instance, Agile principles refer to concepts related with waste elimination like refactoring, YAGNI, the simplest that works, DRY, etc.
(Jidoka in Software Development)
TPS history and about lean thinking:
Short after WWII Taiichi Ohno and Shigeo Shingeo revolutionized the Toyota Production System with the idea of lean production (Ohno 1988). Because of their visible and tangible success, their ideas were successfully exported from Japan to the Western world.
The Toyota Production System advises to eliminate from the production process all activities that do not produce value to the customer (i.e., Muda).
The philosophy to focus on customer satisfaction as a wayto increase flexibility came back in the ’90 with the book “Lean Thinking” by Womack and Jones (1996). Lean Thinking brought the lean idea into new industries such as the pharmaceutical industry (Petrillo 2007) and software development (Poppendieck and Poppendieck 2006). Agile methods are a group software development methodologies that put the ideas of lean thinking into the practice of software development (Beck et al. 2001).
Ohno identifies two references for lean production: Justin- time production and Jidoka. The elimination of Muda is a requisite for Just-in-time production, where the resources needed to complete a certain step are made available at the latest possible moment. Jidoka is often translated with “autonomation” or “automation with a human mind” and is usually illustrated making the example of a machine that can detect a problem with the produced output and interrupt production automatically rather than continue to run and produce bad output (Ohno 1988; Monden 1993). Some authors translate Jidoka with “quality-at-the-source” (Standard and Davis 1999) meaning that quality is inherent in the production system and is not checked after the process. In essence Jidoka is composed by two parts: a mechanism to detect problems, i.e., abnormalities or defects, and a mechanism to interrupt the production line or machine when a problem occurs (Monden 1993).
We think that the elimination of Muda has received a significant attention in software production, for instance in the analysis of the value stream, the focus on activities that provide value, the deferral of commitment of irreversible decisions to just the moment when it is needed (Poppendieck and Poppendieck 2006; Beck 1999). Jidoka has not received equal attention, in our view.
To our knowledge, all proposals to insert Jidoka in software production relate to automated testing and continuous integration. We agree that automated testing and continuous integration (and the use of tools supporting it such as jUnit1 and CruiseControl2 implement the idea of Jidoka and are extremely important, but this is only one aspect of quality in software development. Also other quality attributes about the code produced and the process can and should make use of Jidoka during software production. In this article, we present the idea underneath a tool to promote Jidoka in software production.
The idea to continuously monitor software artifacts and to alert the developer of possible mistakes or problems is not new.
Tools like Findbugs3 or PMD4 scan Java source code and looks for potential problems, which can be defined by rules written in Java or using XPath expressions.
Jidoka means not allowing defective parts to go ahead in the development process.
(Jidoka: automatización con un toque humano)
Jidoka no es sólo detectar la anormalidad y parar el proceso. Es corregir la condición anormal e investigar la causa raíz para eliminarla para siempre.
(Autonomation - Wikipedia)
Autonomation describes a feature of machine design to effect the principle of jidoka used in the Toyota Production System (TPS) and lean manufacturing.
It may be described as “automation with a human touch”.
This type of automation implements some supervisory functions rather than production functions. At Toyota this usually means that if an abnormal situation arises the machine stops and the worker will stop the production line.
Autonomation prevents the production of defective products and focuses attention on understanding the problem and ensuring that it never occurs.
Rather than waiting until the end of a production line to inspect a finished product, autonomation maybe employed at early steps in the process to reduce the amount of work that is added to a defective product.
To complete Jidoka, not only is the defect corrected in the product where discovered, but the process is evaluated and changed to remove the possibility of making the same mistake again. One solution to the problems can be to insert a “mistake-proofing” device somewhere in the production line. Such a device is known as poka-yoke.
(Poka-Yoke — Wikipedia)
Poka-Yoke is ajapanese term that means “mistake-proofing”.
A poka-yoke is any mechanism in a lean manufacturing process that helps an equipment operator avoid (yokeru) mistakes (poka).
The concept was formalised by Shigeo Shingo as part of the TPS in the 60’s.
The term can refer to any constraint designed into a process to prevent incorrect operation by the user.
Shingo distinguised between the concepts of inevitable human mistakes and defects in the production. Defects occur when the mistakes are allowed to reach the customer. The aim of poka-yoke is to design the process so that mistakes can be detected and ocrrected inmediately, eliminating defects at the source.
(Lean Software Development — Poppendieck)
The amount of waste caused by a defect is the product of the defect impact and the time it goes undetected. A critical defect that is detected in three minutes is not a big source of waste. A minor defect that is not discovered for weeks is a much bigger waste.