Project Context

Volvo Group, one of the leading automotive companies, was invited to take part in this research. The primary motivation for Volvo Group’s interest was to verify, if their company can use DePress and its software defect prediction to increase quality and cost-effectiveness of quality assurance (QA) in their software development projects.

Target Software

An important criteria for selecting the proper software development project for the purpose of this research, was that such a project would be mature enough, to provide historical information which can be used as a source of training data. A special survey was conducted by Volvo Group to select the most suitable candidates for initial research.
During our previous research, we recognized elements occurring in projects that hindered or prevented completion of a prediction \cite{Hryszko2015}. The most important of these elements are:
One of the projects finally selected as a business case subject, was an initiative which develops and maintains an application called Texas, a project which provides all the data necessary to achieve the highest possible recall of prediction such as:
In the considered project, it is possible to clearly distinguish which code changes apply only to bug fixes. This is possible thanks to the practice adopted by its developers: any modification of source code is committed to the code repository with an appropriate comment. In the case of a modification resulting from the fix of a particular defect, a unique identifier is included in the comment. Moreover, such a feature is an example of the correct usage of a version control system. Information on the number of subsequent versions of the software are available, and the naming of each version is standardized.
From an organizational perspective, Texas is a document management tool that supports the Vehicle Type Approval (VTA) certification process for vehicle components and completed vehicles produced by Volvo Group. The VTA certificate confirms that the production samples of a design will meet specified performance standards. Key users of the Texas tool are Volvo Group’s Certification Managers and engineers, as well as their brand representatives and European market companies. The certification process is crucial for the company, as availability of Volvo Group’s products directly depend on it, making the reliability, development and maintenance of the Texas software highly important. Defects in the application can delay the work of certification engineers, which is unacceptable as such delays affect the scheduled dates for approval of certificate documents and publication process and, therefore have a negative impact on the date of product availability.
From a technical perspective, Texas is a Java-based application written using the Java Enterprise Edition computing platform \cite{JEESite}. The main development environment used for the development process is Eclipse Integrated Development Environment \cite{EclipseSite} and Apache Maven \cite{MavenSite}, its main build automation tool. To assemble a complete application, 13 different Maven projects needed to be built by default. For the purpose of this research, 21 consecutive software releases (versions) were available, ranging from Texas version 4.0.0 to version 6.0.0. The source code is stored in a code repository managed by Apache Subversion (SVN) version control system \cite{SVNSite}.
Within the considered project, we can observe three stages of the software life-cycle (project phases):
  1. Development
  2. Testing
  3. Post-release

Related Work

The first publication related to an industrial application for defect prediction was published in 1997 by Khoshgoftaar et al. \cite{Khosh1997}. It was a case study of quality modeling for a very large telecommunications system. The authors of that work used a neural network to model a future fault proneness for a real-world system managed by a telecommunication company. However, the final results were not used to support the company’s quality assurance procedures. Two other publications of Khoshgoftaar and Seliya from 2004 \cite{Khosh2004} and 2005 \cite{Khosh2005} continued with the previous concept and focus on commercial data analysis, but were not applied to a real-world environment. A similar approach is common for most research projects utilizing industrial data - examples can be found in different publications by Ostrand \cite{Ostrand1}, \cite{Ostrand3}, Tosun \cite{Tosun2} and Turhan \cite{Turhan1}, \cite{Turhan2}. Examples of industrial applications of information gathered by using defect prediction can be found in such publications as \cite{Wong}, \cite{Succi} and \cite{Klas}. Complete cases describing the introduction of defect prediction in industrial environments were presented by Ostrand \cite{Ostrand2}, Li et al. \cite{Li} and Tosun \cite{Tosun1}.