A Novel Dataset for Arabic Speech Recognition Recorded by Tamazight Speakers
AbstractAutomatic Speech Recognition (ASR) is an area of research that's constantly evolving, thanks to important advancements like machine learning and deep learning techniques. Its applications are wide-ranging, touching fields like healthcare, public services, and interfaces between humans and machines. What's particularly noteworthy is the pressing need for highquality Arabic datasets to enhance the capabilities of speech recognition on devices that use the Arabic language. In this paper, we introduce a new dataset created with great care, designed specifically for recognizing Arabic speech when spoken by Tamazight speakers. This effort significantly broadens the pool of linguistic resources available for research and practical use. A crucial aspect of developing this dataset is the rigorous quality control applied to the data, which, in turn, improves the accuracy and effectiveness of Arabic speech recognition models. By making use of this innovative dataset, we enable the creation and evaluation of Arabic ASR systems tailored precisely to the needs of Tamazight speakers. This addresses a critical gap in the field of Arabic speech recognition, as it focuses on linguistic groups that have been underrepresented in this technology.