loading page

Investigating the Role of Speaker Counter in Handling Overlapping Speeches in Speaker Diarization Systems
  • +1
  • Thanh Thi-Hien Duong,
  • Phi-Le Nguyen,
  • Hong-Son Nguyen,
  • Ngoc Q. K. Duong
Thanh Thi-Hien Duong
Hanoi University of Mining and Geology

Corresponding Author:[email protected]

Author Profile
Phi-Le Nguyen
University of Science and Technology of Hanoi
Author Profile
Hong-Son Nguyen
Aimenext Join Stock Company
Author Profile
Ngoc Q. K. Duong
Lacroix Impulse
Author Profile

Abstract

In real-life conversations, meetings, or debates, there are often situations where many people speak at the same time, leading to overlapping speech segments. Such overlapping speech is an extremely challenging problem for the speaker diarization task. The widely used clustering-based diarization approaches perform quite poorly under such situations due to their limited capabilities in handling overlapping speeches. This paper investigates a speaker diarization framework in which a new building block, called speaker count, is integrated. Such speaker counter predicts the number of active speakers in each analyzing audio window, then its output is used in the conventional re-segmentation step of the diarization pipelines in order to better label the active speakers in each considered segment. We also investigate the effect of the analyzing audio window size on diarization performance by theoretical analysis. We claim that the speaker count block ensures a lower diarization error rate when the analyzing window size is small enough. Experiment results obtained from two state-of-the-art diarization systems with different settings on two benchmark datasets, AMI Headset mix and DIHARD III, confirmed the effectiveness of the proposed approach.