Authorea

Minimal Pair Counts

The most basic way that has been proposed to measure functional load is by counting the number of minimal pairs that are distinguished by a phonemic contrast. Indeed this method is still in use today \citep[e.g.,][]{Wedel2013}. It is with this method that we decided to begin our research, in order to get a general idea of the distribution of the lexicon, before venturing into more complex calculations. A complication does arise however in that functional load is traditionally spoken of in reference to phonemes rather than features. We therefore needed to refine the definition of minimal pair in order to perform our calculations.

We define a phonemic minimal pair as a pair of words in a given language which are contrasted by only one phonological segment. Furthermore, we define a featural minimal pair as a phonemic minimal pair where the difference between the segments affects only one feature. The pair of words /pul/, poule (chicken) and /sul/, saoule (drunk) form a phonemic minimal pair in that they are distinguished solely by their initial segment, but they do not form a featural minimal pair because these segments are contrasted in two features (i.e., voicing and place). The pair /pul/, poule (chicken) and /bul/, boule (ball), however, do form a featural minimal pair as the segments that distinguish them differ only in voicing.

It is of course important to establish what is to be considered a word in order to perform such a calculation. For the purposes of this study, we considered all lemmata to be “words”. This choice was made on the assumption that alternate forms of words, including feminine and plural forms, are not stored separately in the mental lexicon and that phonological features would therefore not be used to contrast them in the same way as for the base forms. All calculations were performed using the Lexique database of the French lexicon \citep{New2001}. It contains 47,341 lemmata, of which 28,885 are nouns. Phonological transcriptions are provided in this database based on canonical pronunciation.

We thus began our research by counting the number of minimal pairs that we observed in each phonological feature. Overall counts were performed (one for each feature), such that each time a minimal pair was found in feature \(x\), the \(x\) count was updated. A pair like /pul/~/bul/ would be considered to be a voicing pair, and the voicing count would therefore be increased by one. This process was performed for each unique pair of words. This basically means that /pul/~/bul/ was considered to be equivalent to /bul/~/pul/. Again, only featural minimal pairs as previously defined were counted. Given that phonological structure may in some part be dependent on syntactic category \citep[cf.][]{REF}, we decided to start by looking at nouns only, and then extended our calculations to the lexicon as a whole. Indeed, we wanted to have a concrete idea of what asymmetries among featural functional load, if any, were present in the noun category, as the data from the experimental component of the present study is based on nouns. We were also unsure if the position of the critical difference might play a role in the exploitation of phonological features \citep[cf.][]{Connine1993,Marslen-Wilson1989}.

Therefore, we broke our calculation down into: the whole lexicon, nouns only, nominal minimal pairs distinguished on the first segment, and nominal minimal pairs distinguished on any segment but the first. The results of these counts for each feature¹ can be seen in Figure \ref{fig:mpcounts}. The overall pattern was not descriptively different in the whole lexicon as compared to nouns, but changed slightly when the nouns were broken down. It should also be noted that we did find more total minimal pairs in all features that were distinguished by their initial segment than were distinguished by any other segment, SOMETHING ABOUT IMPORTANCE OF INITIAL SEGMENTS.

The counts for the place feature were corrected by dividing the total number of observations by two. This was done because the place feature can take any one of three values, giving it one degree of freedom more than the other two features. Our correction allowed us to more easily compare the three features to each other.↩