Assessment Method
To investigate the cost effectiveness of the defect prediction technique applied to an industrial software development project using DePress framework, we agreed to follow the plan below:
Creation of a QA effort allocation strategy, based on defect prediction provided by DePress;
Analysis of actual, real-life costs of quality assurance for the selected release (4.0.0);
Creation of prediction models for the chosen release;
Selection of highest prediction recall;
Usage of an effort allocation strategy, based on the prediction characterized by the with highest recall achieved, to simulate a prediction-based quality assurance usage scenario;
Comparison of actual (without DePress tool used) and simulated (with DePress tool) QA costs.
Quality Assurance Effort Allocation Strategy
\label{strategy_par}
The Pareto principle can be observed in software quality (20% of program modules are responsible for 80% of defects)
(Endres 2003, Pressman 2010, Iqbal 2009). Hence, any effective way to recognize the mentioned 20% of “high risk” computer code will help with such quality assurance (QA) efforts allocation, so that the maximum number of software defects will be eliminated using the available resources within a limited time period. Additionaly, in 1976 Boehm proved, that defect fixing costs grow in every following phase of the software project
(Boehm 1976). That observation, which is widely called Boehm’s Law
(Endres 2003), results in another important consequence of smart quality assurance efforts allocation: the earlier the QA actions will take place, the better it is from the perspective of the software development project’s budget.
Considering the above facts, we proposed a strategy which would use the prediction model to indicate as much as possible the mentioned 20% of software modules responsible for 80% of the bugs, therefore helping to integrate as much as possible the QA efforts into the coding stage of the software creation, while defect fixing cost is still relatively low. Such an approach should ideally decrease the total cost of bug fixing in the project and create savings for the total project’s budget
(Slaughter 1998).
If we denote \(M_{total}\) as the total number of testable program’s modules and \(H_{total}\) as the total number of discoverable defects, we can say that \(0,8H_{total}\) comes from \(0,2M_{total}\).
We can expect that:
\begin{equation}
\label{expect_Rec}0<Rec<1\\
\end{equation}
Where \(Rec\) is measured as the highest possible recall of defect prediction performed by DePress framework, being the number of modules correctly indicated (predicted) as responsible for 80% of discoverable defects \(M_{i}\) should be:
\begin{equation}
\label{modules_predicted}M_{i}=0.2\times Rec\times M_{total}\\
\end{equation}
Then, based on the Pareto principle, we should expect that if the machine learning mechanism will be able to point out the “high risk” 20% of software modules with measured recall (\(Rec\)), the number of defects which can be avoided by allocation of the best quality assurance efforts on the first (development) project’s phase, shall be:
\begin{equation}
\label{strategy}H_{1}^{\prime}{}=0.8\times Rec\times H_{total}\\
\end{equation}
Number of defects to be fixed in second and third phase of the project:
\begin{equation}
\label{hrest}H_{2+3}^{\prime}{}=H_{total}-H_{1}^{\prime}{}\\
\end{equation}
Return on Investment
To investigate if usage of DePress framework will pay off, we shall use the Return on Investment (ROI) factor:
\begin{equation}
\label{roi}ROI=\frac{Benefit-Investment}{Investment}\\
\end{equation}
If the investment will not pay off, the ROI factor is negative, otherwise positive. In our evaluation of defect prediction cost-effectiveness we will focus on potential benefits that method will generate:
\begin{equation}
\label{benefit}Benefit=C_{total}-C_{total}^{\prime}{}\\
\end{equation}
Where \(C_{total}^{\prime}{}\) is the simulated total quality assurance cost in the project with defect prediction applied, and \(C_{total}\) is the actual QA cost in the project, without the defect prediction used.
\(Investment\) is defined as the total cost of defect prediction introduction. Moreover, \(NetReturn\) is calculated as \(Benefit\) reduced by \(Investment\):
\begin{equation}
\label{netreturn}NetReturn=C_{total}-(C_{total}^{\prime}{}+Investment)=Benefit-Investment\\
\end{equation}
Benefit Cost Ratio
To analyze potential benefits from the usage of defect prediction, we will use the Benefit Cost Ratio (BCR) factor:
\begin{equation}
\label{bcr}BCR=\frac{Benefit}{Investment}\\
\end{equation}
Values larger than 1 for the BCR mean a monetary gain from DePress based defect prediction usage, values smaller than 1 – a loss.