loading page

Recurrent Neural Networks With Conformer for Speech Emotion Recognition
  • +1
  • Chenjing Sun,
  • Jichen Yang,
  • Xin Huang,
  • Xianhua Hou
Chenjing Sun
South China Normal University

Corresponding Author:[email protected]

Author Profile
Jichen Yang
Guangdong Polytechnic Normal University
Author Profile
Xin Huang
South China Normal University
Author Profile
Xianhua Hou
South China Normal University
Author Profile

Abstract

Speech emotion recognition plays an important role in many applications, but the task is challenging due to various factors such as background noise, different speaker speech characteristics, etc. The well known speech emotion recognition system ACRNN uses CNN to extract local features of speech signals and attention mechanism focuses on the parts with prominent emotions. However, it has no ability to capture long-term global information and it also has no ability to jointly attend to the information from different representation subspaces at different positions because only one single attention module is used. In order to settle out the drawbacks of ACRNN, CoRNN is proposed in this letter by applying Conformer to replace the modules of CNN and attention module. The experimental results on IEMOCAP dataset demonstrate the unweighted average recall of the proposed CoRNN can achieve 65.53%, which improves 0.79% comparing with ACRNN.