loading page

A Novel Dataset for Arabic Speech Recognition Recorded by Tamazight Speakers
  • Nourredine OUKAS,
  • Tiziri Chabi,
  • Tilelli Sari
Nourredine OUKAS
Department of Computer Science, LIM Laboratory, University of Bouira

Corresponding Author:[email protected]

Author Profile
Tiziri Chabi
Department of Computer Science, LIM Laboratory, University of Bouira
Tilelli Sari
Department of Computer Science, LIM Laboratory, University of Bouira

Abstract

Automatic Speech Recognition (ASR) is an area of research that's constantly evolving, thanks to important advancements like machine learning and deep learning techniques. Its applications are wide-ranging, touching fields like healthcare, public services, and interfaces between humans and machines. What's particularly noteworthy is the pressing need for highquality Arabic datasets to enhance the capabilities of speech recognition on devices that use the Arabic language. In this paper, we introduce a new dataset created with great care, designed specifically for recognizing Arabic speech when spoken by Tamazight speakers. This effort significantly broadens the pool of linguistic resources available for research and practical use. A crucial aspect of developing this dataset is the rigorous quality control applied to the data, which, in turn, improves the accuracy and effectiveness of Arabic speech recognition models. By making use of this innovative dataset, we enable the creation and evaluation of Arabic ASR systems tailored precisely to the needs of Tamazight speakers. This addresses a critical gap in the field of Arabic speech recognition, as it focuses on linguistic groups that have been underrepresented in this technology.