FIGURE 1  The U-Net architecture used for the low-frame rate to high frame rate beam-formed image prediction. This is a 4-layer deep net architecture that takes 256 x 256 pixels as the inputs. Dilution or dropout layers are added for regularization to avoid overfitting. Metal plates and graphite powder phantoms imaged with Acoustic X system at 4kHz PRF and 128 frame averages are considered for the training images, and the corresponding images with 25,600 frame averages at 4kHz PRF (obtained without disturbing any equipment setup or phantoms) are the training labels. The bottom panel showcases out-of-class testing data, i.e., subcutaneous mouse tumor images used for testing our network.
2.6 | Frequency spectrum analysis
We converted the low, high no. of frame averaging and U-Net generated images from spatial domain to the frequency domain by fft2command in MATLAB. We also used fftshift to rearrange the Fourier transformed frequencies by shifting the zero-frequency component towards the central region. For visualization purpose, we converted the frequency array into the log scale adding 1 with the shifted values just to avoid the zero-division error.
The training images consist of various metal plates, graphite powder inclusion and gelatin-based graphite inclusion phantoms which were captured with the low-frame averaging and the corresponding high-frame averaged images were considered as labels (Fig. S1). The graphite rod diameter was 0.5 mm. The gelatin phantoms with inclusion were made with three different graphite powder concentrations – 0.01%, 0.025%, and 0.05% respectively that acted as our training samples. A snapshot of some of the training images and their corresponding labels have been provided in the supplementary document (Fig. S1). The paucity of proper training samples necessitated implementing multiple ways of data augmentation with which the network learned the spatial invariance and robustness properties of the images. Rotation, pixel shifting, horizontal flip, zooming, and shear have been performed in this respect. The in vitro graphite powder phantoms were used as the validation set for the hyper-parameter tuning like the batch size, optimized learning rate determination and dropout probability fixation. Minimal to negligible overfitting occurred in this network which can be confirmed from Fig. S2 where the training and the validation loss did not diverge significantly. 800 training datasets were used with 20 epochs and 300 steps per epoch and 80 datasets were used for the validation steps. For the testing purpose, 8 mice tumor image data at different cross-sections totalling more than 50 individual samples were utilized.
Several image quality metrics were utilized in this study. SNR was determined as a ratio of actual signal from the object to the background noise as per the formula [52]:
\(\text{SNR}=\ \frac{m_{\text{target}}}{\sigma_{\text{background}}}\),
where \(m_{\text{target}}\)is the mean intensity data of the region of interest (ROI) and \(\sigma_{\text{background}}\) denotes the standard deviation of the selected background region. Similar to SNR, another image metric Peak SNR (PSNR) has been used [53]:
\(\text{PSNR}=\ \frac{max(\text{Target})}{\sigma_{\text{background}}}\),
where \(max(\text{Target})\) is the maximum intensity data of the ROI. CNR determines how well the target object can be distinguished from the background. The following formula has been used for calculating CNR [52]:
\(\text{CNR}=\ \frac{m_{\text{target}}{-\ m}_{\text{background}}}{\sigma_{\text{background}}}\ \),
where \(m_{\text{background}}\) is the mean intensity data of the noisy background.
3 | RESULTS AND DISCUSSION