Conclusion: The current paper applies OHEM to speech anti-spoofing systems and experimentally employs four different sets of models. The experimental results indicate that the presented system can identify some unseen attacks well. Our best system achieved an EER of 0.77 % by a score-level fusion. For future work, we hope to employ the methods used in our experiments in more anti-spoofing models.
References
  1. Delac K, Grgic M. A survey of biometric recognition methods[C]//Proceedings. Elmar-2004. 46th International Symposium on Electronics in Marine. IEEE, 2004: 184-193.
  2. Wu Z, Evans N, Kinnunen T, et al. Spoofing and countermeasures for speaker verification: A survey[J]. speech communication, 2015, 66: 130-153.
  3. Wang X, Yamagishi J. Investigating self-supervised front ends for speech spoofing countermeasures[J]. arXiv preprint arXiv:2111.07725, 2021.
  4. Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE transactions on acoustics, speech, and signal processing, 1980, 28(4): 357-366.
  5. Todisco M, Delgado H, Evans N. Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification[J]. Computer Speech & Language, 2017, 45: 516-535.
  6. Tak H, Patino J, Todisco M, et al. End-to-end anti-spoofing with rawnet2[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 6369-6373.
  7. Wang X, Yamagishi J. A comparative study on recent neural spoofing countermeasures for synthetic speech detection[J]. arXiv preprint arXiv:2103.11326, 2021.
  8. Lavrentyeva G, Novoselov S, Tseren A, et al. STC antispoofing systems for the ASVspoof2019 challenge[J]. arXiv preprint arXiv:1904.05576, 2019.
  9. Li X, Li N, Weng C, et al. Replay and synthetic speech detection with res2net architecture[C]//ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2021: 6354-6358.
  10. Zhang Y, Jiang F, Duan Z. One-class learning towards synthetic voice spoofing detection[J]. IEEE Signal Processing Letters, 2021, 28: 937-941.
  11. Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 761-769.
  12. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.