Model Performance
The very first model that we tried achieves an accuracy of 81% for differentiating COVID-19 X-ray images from normal and two other respiratory infection cases. Specifically, the precision and recall for the COVID-19 category are 90.9% and 100%, meaning that the model performs very well when differentiating COVID-19 images from others. However, the model did considerably worse when differentiating images of viral and bacterial pneumonia.
The first thing that is worth trying is data augmentation. A larger data set with more variety of representative objects will train a more accurate model. The exact number of images and objects cannot be specified, but some guidelines recommend as many as 1,000 representative images for each class. We mentioned previously that the dataset is imbalanced with only 60 COVID-19 cases, thus we can add more COVID-19 images by referring to other data sources. However, most of the time the dataset is indeed limited and obtaining more images is either impossible or too difficult. Data augmentation can attenuate this challenge by artificially generating more images while preserving the same pattern. The augmentation filters available in IBM Visual Insights include blur, sharpen, vertical and horizontal flips, rotation with different angles, and noise. We applied the vertical and horizontal flip and rotation with 90 degrees to augment our COVID-19 dataset, thus increasing the number of cases from 60 to 480. The updated model achieves an overall accuracy of 84%, with precision and recall for the COVID-19 category are 95.4% and 98.8%. The results suggest that DL with X-ray imaging may extract significant biomarkers related to the COVID-19 disease. Users can also choose other base models and conduct the hyperparameter tuning to get a better model.
Deployment
After training the model can be deployed on an accelerator (such as GPU or Xilinx FPGA). An API endpoint will be generated at the same time as deployment. When using the API, the smaller the confidence threshold is specified, the more results are returned. For example, when specifying 0, all results will be returned because there is no filter based on the confidence level of the model. A visualization in terms of a heatmap is also presented in the results, as shown in Figure \ref{927540}. The heatmap quantifies the “importance” of individual pixels with respect to the classification decision. The heatmap shows that most "important" pixels are near the lung regions, indicating that the model indeed extracts some significant biomarkers for COVID-19 detection.