3.1 | Evaluation of U-Net architecture on in-classin vitro tissue mimicking phantoms
The top row of Fig. 2 (a-e) shows the ultrasound images of two groups of
in vitro phantoms: one of them is the cross-section of thin wire
phantom, and the other is made with gelatin and TiO2 as acoustic and
optical scatterers. The low frame number averaged photoacoustic images
displayed in the second row of Fig. 2 have high background noise due to
the low pulse energy of the LED system. Upon tuning the hyperparameters
of our network, we fed the U-Net with the low number of frame averaged
LED images (Fig. 2 (f-j)) of in vitro phantoms. The images in the third
row (Fig. 2 (k-o)) are the corresponding high-frame averaged images of
Fig. 2 (f-j). The outcome of our network depicted in Fig. 2 (p-t) are
visually i.e., qualitatively similar to the high number of frames
averaged photoacoustic images. Furthermore, we also observe significant
reduction in the noise in the U-Net outcome compared to the low frame
number averaged images. Another polyvinyl acetate-based text logo with
gelatin embedded 3D phantom (See Fig. 3) was raster scanned with 29
cross-sectional frames where one sample cross-section plane is
demarcated in yellow dotted line. The left column shows a 2D
representation of a plane and the right column shows the 3D view. The
imaging planes are spaced 1 mm apart from each other. We compare the
imaging speed for capturing the 3D view with three different modes - low
no. of frame averaging, high no. of frame averaging, and U-Net outcomes.
Low no. of frame averaging took approximately 15.5 sec (frame rate: 30
Hz, and 0.5 sec for the motor movement at each imaging cross-section)
whereas high no. of frame averaging took 274.5 sec (frame rate: 0.15 Hz,
and 0.5 sec for the motor movement at each imaging cross-section). Our
U-Net took almost same time as low no. of frame averaging process
because testing the deep learning networks with GPUs consume only 0.25
millisecond per frame (We just have to capture the low no. frame
averaging images and run our U-Net as a downstream module). We compared
the noise level distribution in the low number of frame averaged inputs
and the U-Net output and representative photoacoustic images are shown
in Fig. S3(a). The ROI (blue square) of 8 x 8 pixels represented the
background. The noise level in the ROI for the Low frame number averaged
image input and U-Net output for all the in vitro samples are shown in
Fig. S3(b). Clearly the noise levels in the U-Net outcome image were
significantly lower than the input image. Specifically, an approximately
8-fold (8.57 ± 3.5) reduction in the noise levels was observed in the
U-Net output images.