Abstract
Voice likability is a critical factor in machine-human interaction.
However, studies on speech likability typically does not apply the
harmony theory in music, which suggests general rules for pleasant
sounds. In this paper, I propose a new method that estimates the
likability of vocal signals using the harmonic relation of pitch and the
first formant (F1). I extract the pitch and F1 from the vowel signal and
compute the average cent value between notes in the musical scale from
each pitch and F1. A small cent value indicates a consonant relation
between pitch and F1. I compared the calculated cent values with the MOS
test results from ten speech samples. The results showed a clear
correlation between the subjective MOS scores and the consonance of
pitch and F1 in vowels.