Abstract
Convolutional neural networks (CNNs) have been widely applied in the
field of computer vision with the development of artificial
intelligence. MobileNet and ShuffleNet, among other depthwise separable
convolutional neural networks, have gained significant advantages in
deploying on resource-constrained embedded devices due to their
characteristics such as fewer parameters and higher computational
efficiency compared to previous networks. In this paper, we focus on the
hardware implementation of ShuffleNetV2. We optimized the network
structure. Feature channel numbers, pooling modes, and channel shuffle
modes are modified, resulting in a 1.09% increase in accuracy while
reducing the parameter count by 0.18M. Additionally, we implement a
highly parallel hardware accelerator on the Xillinx xczu9eg FPGA, which
supports both standard convolution and depthwise convolution. The power
consumption of this accelerator is only 7.3W while achieving an energy
efficiency of 13.45 GOPS/W. The running frame rate achieves 675.7 fps.