Modelling the Missing Letter Effect



This report outlines attempts to develop and evaluate a formal model of the "missing letter effect" (MLE), a phenomenon whereby literate human participants achieve differential performance in a letter detection task across word-type classes when performing the task while simultaneously reading for comprehension. Beginning with the observation that extant models fail to predict phenomena involving both the rate of detection failure and the mean detection response time, the report describes a new model that can predict these phenomena. The report continues by describing efforts to assess the new model's ability to predict additional phenomena related to the missing letter effect, including discussion of important statistical issues for fair evaluation.


The ultimate aim of the scientific endeavour is to observe the phenomena of our experience, derive hypotheses ("models") regarding the causal history of these observations, and finally evaluate the degree to which these models can predict new observations (where "new" need not necessarily imply temporal order to the observations, merely that the modeller did not have these observations in mind when deriving the model). A seeming trend in the history of science is that, as a given sub-field of science matures, its models move from simple verbal expressions of relations to more formal, mathematical or algorithmic expressions of process. Formal models permit not only unambiguous interpretation of the intent of the modeller, but also increased potential for evaluating possibly unforeseen consequences of the model itself as its complexity begins to outstretch the capacity for the human mind to fully and simultaneously encompass. To this end, formal models must typically be evaluated for not only their ability to adequately account for the phenomena that inspired their creation, but also corollary phenomena that they might additionally predict. The success and failure to predict such corollary phenomena provide insight into the strengths and weaknesses of the underlying model, providing points of arbitration between competing models and often providing insight into modifications or even alternative models that might better account for the observed phenomena. The iteration of this process, assessing the modified-or-alternative model's ability to account for yet further phenomena, yields increasing predictive power, the hallmark of scientific advance.

This report will outline the history of one research group's endeavours to engage this process of model creation-and-evaluation for a set of phenomena related to human reading skill. The story begins with the observation of a phenomenon dubbed the "missing letter effect" (MLE), which occurs when literate human participants are asked to read a passage of text in preparation for later comprehension testing, while simultaneously indicating the presence of some or other specific "target" letter amongst the words in the passage. The critical observation in this context is that the probability that the target letter will be detected differs systematically depending on the characteristics of the word within-which it is embedded. On the surface, the MLE might seem a queer phenomenon, but one that occurs in a relatively contrived context and therefore possibly warranted lower status in the hierarchy of priorities for scientific exploration. However, as is often the case, the seeming obscurity of the MLE belies it's true utility insofar as it provides a window into the processes requisite for expert reading, a skill that is highly valued for both it's economic and intellectual benefits to both the reader and society. Specifically, it's known that less-skilled readers fail to manifest the MLE to the same magnitude as more highly skilled readers, not only providing a quick and unobtrusive means to measure reading skill, but also potential insight into the processing failures (and remediation thereof) that contribute to poor reading skills.

word frequency & role
brain as prediction engine, seeking to use all available information (foveated letters, word shape, previous context, parafoveal word-shapes before & after) in parallel

Analysis of word characteristics associated with variability in detection probability yields a strong difference between so-called "content" words versus "function" words. Content words are operationalized as those that provide information (both in the informal and formal sense) critical to comprehension, while "function" words are more linguistic placeholders that, while not information-free, are relatively easily predicted from the context of the words preceding them and provide relatively little information for predicting subsequent words. This discrepancy in information content is the focus of the models of MLE to be discussed.