Figure 7: Highest weighted patch within the highest weighted slice for two test set examples, demonstrating that the model can detect abnormalities throughout the brain.
Discussion
The models presented in this work exploit the hierarchical nature of neuroimaging data to accurately classify a range of glioma-related abnormalities. By doing so it was shown that they outperform simpler models which treat separate sequences, slices, or intra-slice regions equally. Crucially, this hierarchical approach provides an intuitive form of model interpretability. By visualising patch-HAN’s attention weights, it is possible to localise abnormalities both across and within individual slices, while sequence-HAN’s attention weights provide inter-slice localisation while additionally providing importance scores for each imaging sequence. Critically, this is achieved without the need for slice- or pixel-level annotation during training, requiring only a series-level label which is applied to all slices. As such, our approach lends itself to training on large-scale retrospective hospital image collections, and is well suited for use as part of a semi-autonomous triage system. In the future, we wish to extend the model to allow incorporation of non-imaging data such as patient clinical history. This can be extracted from the free-text report that accompanies images on PACS and embedded into a machine-readable representation (e.g., using ALARM, BioBERT (Lee et al., 2019). ClinicalBERT (Alsentzer et al., 2019) language models) and introduced as an additional hierarchy. Furthermore, we plan to test the model on a range of abnormalities where sequences other than T2-weighted images are important, for example, acute infarct cases for which DWI and apparent diffusion coefficient (ADC) maps appear particularly discriminatory. In the future, we also plan to combine the two models presented here to allow simultaneous sequence, slice, and region importance scores to be determined, However, for this work we were limited to a single 11 GB GPU which isn’t sufficient for training this larger model.
Conclusion
In this work we introduced a hierarchical attention network to analyse real-world non- volumetric clinical MRI data, demonstrating that this hierarchical treatment of neuroimag- ing data leads to gains in model performance, while coarsely localising the abnormality both across and within individual image slices. As such, the model is suitable for use as part of a semi-automated triage system, where both model accuracy and interpretability are important.