RespBERT: A multi-site validation of a Natural Language Processing
algorithm, of Radiology Notes to Identify Acute Respiratory Distress
Syndrome (ARDS)
Abstract
Acute respiratory distress syndrome (ARDS) is a severe organ dysfunction
that is associated with significant mortality and morbidity among
critically ill patients admitted to the Intensive Care Unit (ICU). The
etiology associated with ARDS can be highly heterogeneous, with most
cases being associated with infection or trauma. ARDS is often described
as a clinical syndrome associated with poor oxygenation even in the
presence of mechanical ventilation. The Berlin criteria of ARDS is the
current gold standard for identifying whether patients had developed
ARDS, however it often requires manual adjudication of the chest
radiograph, resulting in limited tools to automate the process. Since
the determination of ARDS is dependent on the presence of bilateral
infiltrates on radiographic images, and this information is not
typically available in Electronic Medical Record (EMR). Automated
determination of the presence of radiological evidence would enable
robust study of the syndrome by eliminating expensive individual
inspection by physicians of the images. The text of radiological reports
provides an opportunity for Natural Language Processing (NLP) to
determine the status of the lungs for evaluating the imaging criterion.
We developed a Natural Language Processing (NLP) pipeline to analyze
radiology notes of 362 patients satisfying sepsis-3 criteria from the
Electronic Medical Record (EMR) to determine possible ARDS diagnosis.
The radiology notes were de-noised and preprocessed. They were further
vectorized through the word-embedding pipeline BERT and fitted to a
classification layer using transfer learning. These classification
models showed F1-score of 74.5% and 64.22% for Emory and Grady dataset
respectively.