FIGURE 1 The U-Net architecture used for the low-frame rate to
high frame rate beam-formed image prediction. This is a 4-layer deep net
architecture that takes 256 x 256 pixels as the inputs. Dilution or
dropout layers are added for regularization to avoid overfitting. Metal
plates and graphite powder phantoms imaged with Acoustic X system at
4kHz PRF and 128 frame averages are considered for the training images,
and the corresponding images with 25,600 frame averages at 4kHz PRF
(obtained without disturbing any equipment setup or phantoms) are the
training labels. The bottom panel showcases out-of-class testing data,
i.e., subcutaneous mouse tumor images used for testing our network.
2.6 | Frequency spectrum analysis
We converted the low, high no. of frame averaging and U-Net generated
images from spatial domain to the frequency domain by fft2command in MATLAB. We also used fftshift to rearrange the Fourier
transformed frequencies by shifting the zero-frequency component towards
the central region. For visualization purpose, we converted the
frequency array into the log scale adding 1 with the shifted values just
to avoid the zero-division error.
The training images consist of various metal plates, graphite powder
inclusion and gelatin-based graphite inclusion phantoms which were
captured with the low-frame averaging and the corresponding high-frame
averaged images were considered as labels (Fig. S1). The graphite rod
diameter was 0.5 mm. The gelatin phantoms with inclusion were made with
three different graphite powder concentrations – 0.01%, 0.025%, and
0.05% respectively that acted as our training samples. A snapshot of
some of the training images and their corresponding labels have been
provided in the supplementary document (Fig. S1). The paucity of proper
training samples necessitated implementing multiple ways of data
augmentation with which the network learned the spatial invariance and
robustness properties of the images. Rotation, pixel shifting,
horizontal flip, zooming, and shear have been performed in this respect.
The in vitro graphite powder phantoms were used as the validation set
for the hyper-parameter tuning like the batch size, optimized learning
rate determination and dropout probability fixation. Minimal to
negligible overfitting occurred in this network which can be confirmed
from Fig. S2 where the training and the validation loss did not diverge
significantly. 800 training datasets were used with 20 epochs and 300
steps per epoch and 80 datasets were used for the validation steps. For
the testing purpose, 8 mice tumor image data at different cross-sections
totalling more than 50 individual samples were utilized.
Several image quality metrics were utilized in this study. SNR was
determined as a ratio of actual signal from the object to the background
noise as per the formula [52]:
\(\text{SNR}=\ \frac{m_{\text{target}}}{\sigma_{\text{background}}}\),
where \(m_{\text{target}}\)is the mean intensity data of the region of
interest (ROI) and \(\sigma_{\text{background}}\) denotes the standard
deviation of the selected background region. Similar to SNR, another
image metric Peak SNR (PSNR) has been used [53]:
\(\text{PSNR}=\ \frac{max(\text{Target})}{\sigma_{\text{background}}}\),
where \(max(\text{Target})\) is the maximum intensity data of the ROI.
CNR determines how well the target object can be distinguished from the
background. The following formula has been used for calculating CNR
[52]:
\(\text{CNR}=\ \frac{m_{\text{target}}{-\ m}_{\text{background}}}{\sigma_{\text{background}}}\ \),
where \(m_{\text{background}}\) is the mean intensity data of the noisy
background.
3 | RESULTS AND DISCUSSION