A Depthwise Separable Convolution Hardware Accelerator for ShuffleNetV2

Linshuang Li; Dihu Chen; Tao Su

doi:10.22541/au.171419919.94311240/v1

loading page

A Depthwise Separable Convolution Hardware Accelerator for ShuffleNetV2

Linshuang Li,
Dihu Chen,
Tao Su

Abstract

Convolutional neural networks (CNNs) have been widely applied in the field of computer vision with the development of artificial intelligence. MobileNet and ShuffleNet, among other depthwise separable convolutional neural networks, have gained significant advantages in deploying on resource-constrained embedded devices due to their characteristics such as fewer parameters and higher computational efficiency compared to previous networks. In this paper, we focus on the hardware implementation of ShuffleNetV2. We optimized the network structure. Feature channel numbers, pooling modes, and channel shuffle modes are modified, resulting in a 1.09% increase in accuracy while reducing the parameter count by 0.18M. Additionally, we implement a highly parallel hardware accelerator on the Xillinx xczu9eg FPGA, which supports both standard convolution and depthwise convolution. The power consumption of this accelerator is only 7.3W while achieving an energy efficiency of 13.45 GOPS/W. The running frame rate achieves 675.7 fps.