3.1 | Evaluation of U-Net architecture on in-classin vitro tissue mimicking phantoms
The top row of Fig. 2 (a-e) shows the ultrasound images of two groups of in vitro phantoms: one of them is the cross-section of thin wire phantom, and the other is made with gelatin and TiO2 as acoustic and optical scatterers. The low frame number averaged photoacoustic images displayed in the second row of Fig. 2 have high background noise due to the low pulse energy of the LED system. Upon tuning the hyperparameters of our network, we fed the U-Net with the low number of frame averaged LED images (Fig. 2 (f-j)) of in vitro phantoms. The images in the third row (Fig. 2 (k-o)) are the corresponding high-frame averaged images of Fig. 2 (f-j). The outcome of our network depicted in Fig. 2 (p-t) are visually i.e., qualitatively similar to the high number of frames averaged photoacoustic images. Furthermore, we also observe significant reduction in the noise in the U-Net outcome compared to the low frame number averaged images. Another polyvinyl acetate-based text logo with gelatin embedded 3D phantom (See Fig. 3) was raster scanned with 29 cross-sectional frames where one sample cross-section plane is demarcated in yellow dotted line. The left column shows a 2D representation of a plane and the right column shows the 3D view. The imaging planes are spaced 1 mm apart from each other. We compare the imaging speed for capturing the 3D view with three different modes - low no. of frame averaging, high no. of frame averaging, and U-Net outcomes. Low no. of frame averaging took approximately 15.5 sec (frame rate: 30 Hz, and 0.5 sec for the motor movement at each imaging cross-section) whereas high no. of frame averaging took 274.5 sec (frame rate: 0.15 Hz, and 0.5 sec for the motor movement at each imaging cross-section). Our U-Net took almost same time as low no. of frame averaging process because testing the deep learning networks with GPUs consume only 0.25 millisecond per frame (We just have to capture the low no. frame averaging images and run our U-Net as a downstream module). We compared the noise level distribution in the low number of frame averaged inputs and the U-Net output and representative photoacoustic images are shown in Fig. S3(a). The ROI (blue square) of 8 x 8 pixels represented the background. The noise level in the ROI for the Low frame number averaged image input and U-Net output for all the in vitro samples are shown in Fig. S3(b). Clearly the noise levels in the U-Net outcome image were significantly lower than the input image. Specifically, an approximately 8-fold (8.57 ± 3.5) reduction in the noise levels was observed in the U-Net output images.