Fig 2 Architecture of the accelerator.
For the traditional channel shuffle operation, channels are selected
alternately from two groups of feature maps and recombined into a new
output feature map, which is then transferred to BRAM as the input for
the next convolution. In this paper, the channel shuffle method is
modified by partitioning the output feature maps internally into groups
of 4 channels. Channels are then selected alternately from the two
groups and recombined into a new output feature map, as illustrated in
Fig 3. This approach maintained the advantage of increasing
inter-channel information exchange while reducing the number of memory
read/write operations by 75%, significantly reducing memory access
time.