TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Daniil Orel; Askat Kuzdeuov; Rinat Gilmullin; Bulat Khakimov; Huseyin Atakan Varol

doi:10.36227/techrxiv.170723255.52161895/v1

loading page

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Daniil Orel,
Askat Kuzdeuov,
Rinat Gilmullin,
Bulat Khakimov,
Huseyin Atakan Varol

Abstract

This paper introduces an open-source dataset for speech synthesis in the Tatar language. The dataset comprises approximately 70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female). Notably, it is the first large-scale dataset of its kind that is publicly available, aimed at promoting Tatar text-to-speech (TTS) applications in both academic and industrial contexts. The paper describes the procedures for developing the dataset, discusses the challenges faced, and outlines important future directions. To demonstrate the reliability of the dataset, baseline end-to-end TTS models were built and evaluated using the subjective mean opinion score (MOS) measure. The dataset, training recipe, and pre-trained TTS models are publicly available.

30 Jan 2024Submitted to TechRxiv

06 Feb 2024Published in TechRxiv

Abstract

Peer review timeline