We also compared our work with others work [2][4][5][6]. We achieved a computational efficiency that is 1.75 times higher than [4]. Compared to [5], we achieved a throughput and frame rate that are 2.08 times and 11.5 times higher respectively, even at a lower clock frequency. These comparisons demonstrate significant advantages of our work in terms of computational speed and hardware resource consumption.