Applying Deep Learning to X-rays in a clinical environment: A project proposal
Abdominal and chest X-rays are amongst the most commonly ordered imaging modality in any modern hospital system. (Practitioners) While other imaging modalities such as computed tomography or ultrasound can offer increased resolution and diagnostic accuracy, X-rays being widely available and accessible after hours, are often ordered for either monitoring or diagnosis. However, being so commonly ordered, these tests are often interpreted by junior staff on the wards and hence may not have a formal report by a radiologist until significant time has elapsed. (Pinto 2010) Analysis of junior staff interpretation of suggest that classic appearances of common conditions are commonly not identified, particularly where chest X-rays are concerned. (Eisen 2006) While these studies are primarily performed in the US medical system, the results are nevertheless applicable to the Australian setting. While most diagnoses on X-rays, particularly abdominal X-rays, are straightforward, certain diagnoses may be more elusive to junior staff. In the context of X-rays, we can separate the aetiology of missed lung cancer into detection or decision errors, and perform visual search experiments, concluding that missed lung cancers are often decision rather than detection error. (Manning 2004) Diagnoses such as bowel obstruction on abdominal X-ray are also often subjective and lacking in objective decision rules - these lead to wide discrepancies in accuracy rates between senior and junior staff as senior staff tend to be consistently more confident in these decisions as compared to junior staff. (Thompson 2007)
Beyond junior staff on the wards, X-rays are often ordered in the Emergency Department as a front line, non invasive and cheap test. Given the logistical pressures on Emergency Departments within Australia, emergency physicians often do not have the luxury of waiting for a formal radiology report. For instance, at the Alfred ED - 36% of formal posteroanterior chest X-rays within the last 7 years have been reported within 4 hours, while 21% of chest X-rays done outside of normal working hours are reported within 4 hours. However, while emergency physicians are relatively accurate at identifying common radiographic abnormalities such as consolidation and congestion, the literature suggests discover that less common findings such as coin lesions suggestive of lung malignancy are often missed, with 10 out of 13 lesions missed in this study. (Gatt 2003) Furthermore, acute medical conditions such as pneumonia or exacerbations of congestive heart failure tend to be treated on the basis of their clinical presentation in addition to their radiological findings, reducing the number of potentially improperly treated patients.
While this is good news for such acute conditions, sub-acute conditions such as lung malignancies may be lost to follow up if good communication is not maintained between the radiology department and emergency department following the discharge of a patient. A retrospective study found that out of 58 patients with histologically proven lung cancer, 14 (28%) had prior chest X-ray abnormalities which were not recognised or followed upon. (Turkington 2002) Chest X-rays up to 5 years prior to the diagnostic X-ray were assessed by a respiratory physician and radiologist blinded to the site of the abnormality to identify missed abnormalities. The study further points out that out of the reasons for missing abnormalities, the most common was the lack of diagnosis rather than the lack of follow up or ambiguity within the report.
With the use of radiology in the hospital setting increasing, demands on radiology services to interpret X-rays swiftly and accurately will grow, particularly for after hours and emergency use. To aid junior staff as well as other non-radiologists, we propose to develop a computer aided diagnosis system capable of autonomously diagnosing common and clinically relevant conditions on chest and abdominal X-rays. To maximise the utility of this system for medical staff, we intend to use recent advances in deep learning to illuminate the thought processes of the computer aided system, allowing medical staff to examine the reliability of the system’s prediction and logic.
Furthermore, we intend to show that deep learning approaches in medical imaging, particularly in X-rays, are practically achievable from existing data in clinical data warehouses. To the authors knowledge, no other group has demonstrated both training and evaluation of a deep learning system on chest X-rays from pre-existing clinical data warehouses. By using existing clinical data instead of specially prepared research datasets, we hope to optimise the accuracy of the resulting model to the particular image characteristics and patient population of this institution as well as leverage the large dataset available to us.
In addition to augmenting radiographic diagnosis, a fully autonomous system can have several other applications, such as enabling the construction of future early warning systems designed to detect complications of care, such as that proposed in a prior study at the Alfred - “Project 429/09: Towards an automated surveillance system for invasive fungal infections using existing hospital information systems”. Furthermore, the proposed system can also act as an image recognition and retrieval system, enabling content based image retrieval in addition to existing techniques for searching through report texts. This can aid in case discovery and audit, contributing to teaching as well as quality control.
In recent years, there has been a growth in the use of convolutional neural networks in image processing, particularly in object classification. Based on the success of convolutional neural networks in image recognition tasks (Krizhevsky 2012) we apply CNNs to the detection of conditions in radiology images. Recent state of the art networks achieve up to 96% accuracy rates in identifying objects in photographs in the benchmark ImageNet challenge. (He 2015) While obviously natural imagery and radiographs have substantially different qualities, we hypothesize that with sufficient data, network architectures used in these visual identification challenges will be successful in classifying X-rays.
The notion of using artificial neural networks in radiology is hardly new. As far back as 1992, the literature identifies multiple application areas within medicine for neural networks, naming radiology as a key area of application. (Miller 1992) However, until the late 2000s, limited progress in computational resources and artificial neural network research have hampered the application of these algorithms to medicine. However, following the widespread availability of digital images, increasing efficiency of general-purpose graphical processing units, and the commencement of the inaugural 2010 ImageNet challenge, deep convolutional neural networks made a resurgence in popularity (Krizhevsky 2012), beating other machine learning techniques such as support vector machines, and firmly establishing deep learning as a promising technique for other engineering applications.
Using concepts developed over the last decade in artificial neural networks, we intend to build and evaluate a classifier for use on X-rays. We will use a convolutional neural network similar in architecture to that described in the ImageNet 2012 challenge winner. (Krizhevsky 2012) However, in order to account for fine detail not visualisable on a 256 by 256 grid, we use a larger initial input size. Furthermore, we will add a global average pooling layer in the final classifier. (Lin 2013) This enables us to estimate the location of detected abnormalities, allowing medical staff to inspect the decision process of the network and judge its reliability. To prevent over-fitting, we will employ dropout (Srivastava 2014), and will optimize using Adam (Kingma 2014) with binary cross-entropy as the objective function.
With the increasing availability of digital images in radiology, deep learning may soon make inroads into the medical field, finally fulfilling (Miller 1992)’s predictions about artificial neural networks in medicine. Very recently, a group has already started on analysing openly available, public datasets of chest X-rays - producing impressive classification results. (Shin 2016) They demonstrate convolutional neural networks and recurrent neural networks can be effectively utilized in X-rays as well as natural images - even using the exact network architecture from recent state of the art models such as GoogLeNet (Szegedy 2014). They achieve a validation accuracy of 70% on a limited set of common radiographic diagnoses, which is extremely impressive given the relatively small dataset available. We believe that we can improve on this accuracy and include more clinically relevant diagnoses, by expanding our dataset utilizing data stored within clinical data warehouses and natural language processing techniques.
The use of natural language processing is required in order to obtain sizeable datasets from existing clinical data. In traditional computer vision challenges such as ImageNet, image labels are crowd-sourced, as often laypersons are able to easily identify and tag categories of objects in these photographs. However, given the specialised nature of X-rays, significant resources will be required to create a similar sized dataset - for instance ImageNet 2015 has 8.1 million images for training. As a result, radiology text reports must be processed to obtain categorical labels.
Further work done by the same group demonstrates the potential of using existing clinical data and text reports in machine learning - although not specifically for chest X-rays. (Shin 2015) Semantic topics and keywords are predicted for key computed tomography slices, using labels mined with natural language processing from report text. However, the difficulty of mining disease specific terms precludes the development of robust networks able to predict the presence or absence of disease (e.g. classifying an opacity as a cyst), as increasing the specificity of the mining process reduces the total number of images available drastically. (Shin 2015) We intend to use more complicated frameworks such as the Clinical Text Analysis and Knowledge Extraction System (cTAKES) from the Apache Software Foundation (Savova) to not only identify key topics but to extract their negation status, generating an accurate set of labels for our dataset.
In the commercial environment, a partnership between Enlitic and Capitol Health has only been recently announced and no published work at the time of writing can be found on the results of this collaboration. As such, the differences mentioned in this project are largely based upon unverified statements made by Enlitic online. Current models developed by Enlitic claim to use transfer learning on various forms of radiology imaging including CT, MRI as well as X-rays. Based on existing user feedback as well as the investigators knowledge of transfer learning and the clinical aspects of different radiological modalities, we predict that this may not be as successful on 2D films such as X-rays due to the poor resolution and high ambiguity of signs on these films. Furthermore, datasets available to Enlitic via Capitol Health will have very different populations and will not include the degree of severity and wide range of acute pathology found in a tertiary hospital setting. It is the investigators’ hope to create a system that will focus upon rapid triage and clinical utility rather than simply reducing costs – and as such we aim to focus on radiological modalities requiring rapid turnaround time and as such are often not reported by radiologists, but rather interpreted by junior staff ordering the tests.