The best source for obtaining information about the number of defects registered in each release is a tool used for defect tracking. In the case of the Texas project, the software used for that purpose was JIRA created by the Atlassian company \cite{JiraSite}. Similarly to Eclipse Metrics, JIRA also allows the export of defect data into an XML file, and that file can then be parsed by a DePress node called Jira Offline.
When transferred to DePress, size and defect data can be aggregated with a few simple steps using the KNIME framework capabilities. Results for each historical Texas release are presented in Table \ref{tab:bugs_per_release}.
After analyzing the data presented in table \ref{tab:bugs_per_release}, we decided to use the dataset based on release 4.0.0, both for training and evaluation: the highest number of detected defects were compared to number of code modules available for analysis which made release 4.0.0 the most suitable for the above purposes. For further research, releases 5.0.0 and 6.0.0 can also be used. Other releases – with a significantly lower number of registered defects to a similar amount of modules available for analysis (defect per module ratio \(<0,1\)) made us assume that strong a class imbalance would be observed. Because of this we decided not do any further investigation of these releases.