Methods

Data resource

The data was collected from the First Hospital of Jilin University in China from January 2018 to June 2019. Patient identifiers were removed, and all data were anonymized. Every pregnant woman in this study gave written informed consent. All of the examinations were performed and diagnosed by a team of well-trained doctors from center for prenatal diagnosis of First Hospital of Jilin University. The GE Volution E8 ultrasound scanners were included for data acquisition. The study protocol was approved by Ethics Committee of the First Hospital of Jilin University (Changchun, China; permit No. 2018-429).

Deep learning algorithms for training

Picking out brain images
The first step of our scheme is to pick out brain images from all stored freeze-frame images (Figure 1). This can be done using a classification deep learning model. Several famous models have been proposed and shown good results in image classification related tasks, such as Oxford VGG model [41], Google Inception model [42] and Microsoft ResNet model [43]. Here we chose to apply transform learning using ResNet50, a 50-layer Residual Network, which showed good results for medical image classification [44].
Specifically, each image was first resized to 224×224 and then enter RetNet50 layer with the pretrained IMAGENET weights. GlobalAveragePooling2D is applied to the output of the last convolutional layer, following a classic fully connected dense layer with sigmoid activation. All layers are set as trainable, which means they can be updated with back propagation in each step.
The experiment was carried out with Jupyter Notebook, in an environment of Keras, using TensorFlow as backbend. A workstation with four NVIDIA GeForce GTX 1080 Ti graphics cards, two Intel Xeon E5-2620 v4 CPUs and a Random Access Memory (RAM) of 64GB was used in this experiment. The labels were determined manually by one trained expert.
Picking out TV and TT planes and localization of brain region
After picking out all fetal brain images, we need to further pick out US images in the transventricular (TV) or transthalamic (TT) plane, in which the lateral ventricle can be measured. Moreover, we need to localize the brain region and remove the background around the fetal skull, which will influence the results hugely. Here we used Faster R-CNN [45], a state-of-the-art object detection algorithm, which combines the localization task and classification task together.
Specifically, we used the fasterrcnn_resnet50_fpn model in torchvision to perform the experiments, which uses resnet50 as the backbone. The network parameters were initialized from a model pretrained on the COCO dataset. To make the algorithm more robust, we augmented the dataset by randomly cropping, flipping images and rotating images by angles of 90, 180 or 270 degrees, to simulate various fetal positions. The network would output zero, one or more detected objects and corresponding likelihood percentages. The first object, which has the largest percentage, was selected as the result.
The experiment was carried out with Jupyter Notebook, in an environment of torch. The system used in this experiment was the same with the first step. The bounding boxes of brains were manually labeled by one trained expert and reviewed by doctors. The US images in the TV and TT plane were selected by doctors.
Predicting the lateral ventricular width
A regression model was applied to do this task. Specifically, each brain region image will first resize to 224×224 and then enter RetNet50 layer with the pretrained IMAGENET weights. GlobalAveragePooling2D is applied to the output of the last convolutional layer, following a classic fully connected dense layer with linear activation. All layers were set as trainable, which means they can be updated with back propagation in each step. mean_squared_error was specified as the loss function when compiling the model. The experiment setting and the system used in this experiment was the same with the first step. The truth lateral ventricular width of each image was determined manually by doctors.

Interpretation of the results using heat maps

To provide evidence that our regression model predicting the lateral ventricular width of brain images was based on the anatomical structure of lateral ventricle, we implemented heat maps for visualization and interpretation. Here we used a technique called Class Activation Mapping (CAM) [46] to generate the heat maps. After superimposing the heat map to the grayscale image, we can see the key area, or the red color regions, that the algorithm activated most.