Authorea

Towards a New Measure

Recall that we require a measurement that excludes effects due to phoneme frequency and phonotactics, ignores a priori categorical imbalances, and allows us to statistically compare features to themselves both within a given language, and cross-linguistically. With these requirements in mind, we developed a way of scoring each phoneme within a given feature. The resultant data is a distribution of scores for each feature. Our new scoring method is shown in equation \ref{eq:r}.

\[\label{eq:r} r_{i,j} = \text{log}\left(\frac{o_{i,j}}{p_{i,j}}\right)\]

In this equation, \(o\) represents the number of observed minimal pairs based on a phoneme \(i\) for a feature \(j\). The variable \(p\) represents the number of possible minimal pairs based on a phoneme \(i\) for a feature \(j\). Put more simply, this calculation is done by taking a phoneme \(i\), for example /p/, and calculating how many minimal pairs were observed in a feature \(j\), for example voicing. In this example, \(o\) represents the number of minimal pairs that exist in the lexicon between /p/ and /b/ (e.g., /pul/~/bul/, poule~boule, for French). This total is divided by the number of possible minimal pairs.

The variable \(p\) is calculated by looking at every instance of a phoneme (in our example /p/) and determining, according to the language’s phonotactics, how many minimal pairs are possible in a given feature (in our example voicing). In the case of the French word /pul/, phonotactics tell us that if we were to change the segment /p/ to /b/, the resulting word would be legal, and so the \(p\) would be updated by one. In a word such as /psikoloʒi/, however, if the segment /p/ were changed to /b/, the resulting word */bsikoloʒi/ would be “illegal” in French, because that language’s phonotactics specify a rule that says adjacent obstruents must agree in voicing \citep{Dell1995}. The \(p\) in such a case would not be updated, because one would never observe such a minimal pair in the lexicon. This allows the extraction of effects due to phonotactic contraints. The absence of */bsikoloʒi/ from the French lexicon is not informative for the functional load of the voicing feature in that language.

After having examined the entire lexicon, we take the log-transformed value of the final score, to normalize the score distributions. As there are by definition fewer observed minimal pairs than possible ones, the r-score value is always a negative number, where a score closer to 0 represents higher functional load. Distributions of r-scores for consonantal phonological features can be seen for French in Figure \ref{fig:rscores}.