loading page

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language
  • +2
  • Daniil Orel,
  • Askat Kuzdeuov,
  • Rinat Gilmullin,
  • Bulat Khakimov,
  • Huseyin Atakan Varol
Daniil Orel
Institute of Smart Systems and AI, Nazarbayev University Astana

Corresponding Author:[email protected]

Author Profile
Askat Kuzdeuov
Institute of Smart Systems and AI, Nazarbayev University Astana
Rinat Gilmullin
Institute of Applied Semiotics Tatarstan Academy of Sciences Kazan
Bulat Khakimov
Institute of Applied Semiotics Tatarstan Academy of Sciences, Kazan Federal University
Huseyin Atakan Varol
Institute of Smart Systems and AI, Nazarbayev University Astana

Abstract

This paper introduces an open-source dataset for speech synthesis in the Tatar language. The dataset comprises approximately 70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female). Notably, it is the first large-scale dataset of its kind that is publicly available, aimed at promoting Tatar text-to-speech (TTS) applications in both academic and industrial contexts. The paper describes the procedures for developing the dataset, discusses the challenges faced, and outlines important future directions. To demonstrate the reliability of the dataset, baseline end-to-end TTS models were built and evaluated using the subjective mean opinion score (MOS) measure. The dataset, training recipe, and pre-trained TTS models are publicly available.
30 Jan 2024Submitted to TechRxiv
06 Feb 2024Published in TechRxiv