Threats to Validity
In this paper, defects are not distinguished according to their severity (minor, major, etc.) and use an average, fixed time for each single defect. Omitting the severity measure in defect prediction studies is a frequent practice \cite{Menzies2010}, however it can be important when simulated QA cost calculation will be compared to real-life values. In our simulation we assumed equal severity for each defect, which is reflected in an equal, average cost (Table \ref{tab:costs}). However, when we apply the proposed effort allocation strategy into a real-life environment, we can deal with the situation, of when defects left undetected until the later phases of testing and after-deployment will be characterized by higher severities than defects resolved while within the coding phase; such a situation would negatively impact overall quality assurance costs, when DePress tool would be used for defect prediction purposes, in comparison to simulated values.
In our simulation, quality assurance cost is calculated based on the number of defects expected to be fixed by QA efforts at each phase, multiplied by the average cost to fix a single defect, based on actual project data. Then, simulated costs are compared to actual costs. Rahman et.al. \cite{Rahman2014} argue, that a more efficient way of comparing quality assurance efforts, when the defect prediction models are involved, is by a comparison of AUCEC (Area Under Cost Efficiency Curve) values \cite{Arisholm2010}. The approach followed in our paper was motivated by the fact that the comparison of cost values (actual and simulated) is considered to be more readable by business stakeholders.
In this paper we do not consider any costs not traced by the JIRA system. There was no technical possibility to obtain cost information on quality assurance actions taken outside the project’s team during the post-release phase, an example being the service desk team responsible for contact with an end user, who found a new defect in the after-deployment phase.
For the purpose of simulation, we used the same initial conditions as in an actual project. There were a like number of detectable defects as well as a similar ratio between defects fixed in testing and those fixed in the post-release stages (\ref{hrest}). In the actual defect prediction using the proposed effort allocation strategy, we should expect that the aforementioned conditions would be different, for example if the actual application will consider different software release and/or development project.