Assessment of Defect Prediction Cost Effectiveness
of an Industrial Software Development Project
Supported by DePress Framework
The potential benefits of using DePress (Defect Prediction in Software Systems) Extensible Framework tool for commercial software development projects were not investigated. Documented cases of machine-learning usage in industrial applications for defect prediction purposes, are rare. Due to these facts, representatives of Wroclaw University of Technology and Volvo IT company started long-term cooperation in that area. As a positive result of research described in this paper would trigger more interest from business stakeholders, we decided to verify the cost effectiveness of DePress Framework usage for defect prediction purposes and investigate, if the defect prediction technique would positively impact software development projects by generating profits. To meet this goal, we proposed using defect prediction-based quality assurance (QA) effort allocation strategy based on Pareto principle and Boehm’s law. Then, based on real life data collected from an actual, industrial software development project using DePress Extensible Framework, we conducted defect prediction and simulated potential quality assurance costs based on the best prediction result and proposed QA effort strategy. Results of the simulation were optimistic and allowed continued usage of DePress-based defect prediction for actual industrial projects run by Volvo IT company.
The first attempts to use machine learning for software development quality assurance, were made in the early 1990’s (Munson et al. in 1992 (Munson 1992)). Since then, this approach has gradually improved. Why then, has it not gained wider popularity in commercial projects so far? One of the reasons is the wide variety of data necessary to perform the prediction and many different sources of data encountered in commercial software development projects. Until recently, conducting a prediction on a chosen project, required time-consuming preparation at special, additional program (or programs) that acquired the desired data and then put it through a suitable procedure of preprocessing. One obstacle was the need to gather data from different sources (various tools, processes, methodologies, etc.) and the data itself having differing sizes and measurements. Also, different machine learning mechanisms have a different degree of effectiveness depending on the choice of data sources and data itself (selected in experimental way). Software defect prediction was considered as too complex a process, too cost and time-consuming, and there was no known solutions for wrapping it into one, universal, defect prediction application which could be used for different projects.