Proposal and structure of the study

[This section has been cut and pasted into the document 'Writing evolves towards optimal simplicity' and renamed 'Methodology'] 

Analysis of historical data set

[This section has been cut and pasted intto the document 'Writing evolves towards optimal simplicity']

Predictions

Prediction 1: Visual complexity will decrease over successive generations.

Two distinct analyses will be led, one for perimetric complexity and one for algorithmic complexity. Both analyses will use a nested regression design with individual characters being considered as the basic units of analysis, nested inside the version of the script that they belong to. For instance, the character GA as instantiated in the 1868 version of the script will be one data point. The dependent variable will be, depending on the analysis, algorithmic or perimetric complexity. Each version of the script will be given a rank number corresponding to its chronological position (e.g. 1834 = 1, 1845 = 2). We expect a negative effect of ranking upon complexity for both analyses (i.e. for both algorithmic and perimetric complexity).

Prediction 2: The complexity of graphemes with higher visual complexity scores will decrease more than those that start off with lower scores

Two distinct analyses will be led, one for perimetric complexity and one for algorithmic complexity. Both analyses will use a nested regression design, with individual characters being used as the basic unit of analysis, but this time the grouping variable will be the character type as opposed to the script. For instance, the 1868 version of GA will be grouped with all the other versions of GA (1849 GA, 1860 GA, etc.). Here again the dependent variable will be, depending on the analysis, algorithmic or perimetric complexity. The predictor will be the interaction of the year-rank associated with the version of the script (e.g. 2 for the 1845 version of GA) and the complexity (on the relevant measure) of the 1834 version of the relevant character. We predict that higher 1834 complexity will reinforce the effect of year-rank upon character complexity. This analysis will not be carried out unless Prediction 1 is verified.

Prediction  3: Variance in complexity among characters should decrease with successive versions of the script.

Here again this prediction will be tested for both measures of complexity. We expect a negative correlation between year-rank and the variance exhibited by complexity in individual versions of the script. 
3. Zipf's law of abbreviation.
This prediction only applies to Study II. Some authors have suggested that Zipf's law of abbreviation might apply to graphic codes in addition to spoken languages (Ferrer i Cancho et al. 2013). Zipf's law of abbreviation is the observation that more frequent words tend to be shorter than longer ones (Zipf 1949, Bentz xxxx). Complexity of written signs is a plausible equivalent of word length, since both longer words and more frequent letters contain more information and require more processing effort (Pelli et al.). Rovenchak et al. (Issues in Quant. Ling. paper + 2008 paper) already attempted to find evidence for show Zipf's law of abbreviation on the Vai script (whose length make it quite suitable for such an analysis). They did not succeed, but the measured letter complexity in an idiosyncratic way, based on manual coding, which we think the present study could improve upon. 
We will assess the correlation between character complexity and character frequency, using the frequency measures provided by Rovenchak et al. Given these previous negative findings, we make no particular prediction regarding the existence or direction of this correlation.]

Why we might be wrong (and why being wrong is also interesting)

Vai is already optimally compressed

If no compression effects are found this may well be due to the fact that the Vai writing system may have already been optimally compressed at the time of its creation. Conventionalised but non-linguistic graphic systems were in use by the Vai people prior to their invention of writing (see Kelly [micenei]) and it is argued by literate Vai that elements of these systems have informed the design of the script (Massaquoi 1911). If this was the case then the transition from a semasiographic system to a phonographic one was in itself a substantial "compression event" in which previously multi-valent signs were affixed to single syllables. Such a radical narrowing of interpretation, brought about through a conscious process of group deliberation, would already account for most of the compression to the system: subsequent transmissions would amount to only minor tweaking. The experimental work of Raviv et al (REF) has pointed to the fact that laboratory-generated artificial languages can exhibit display compositional structure without generational turnover. Thus the dynamics of communication within a large group can encourage the emergence of structure, even without transmission events. A recent investigation has shown a distinct yet analogous finding with regards to a cardinality bias in the world's writing systems; that is, the fact writing systems display a preponderance of horizontal and vertical lines . Morin demonstrated that scripts do not tend to become more cardinal over multiple transmissions in historical time, but that cardinality is "baked in" from the beginning (Morin under review). Thus, we may discover that the near-optimal compression of Vai and other non-laboratory generated graphic systems was an outcome of upstream cognitive biases and/or group dynamics. 
Alternatively, the script was not optimally compressed at the time of its creation but that optimality was achieved in the short period between its creation in ca. 1833 and the date of the earliest surviving evidence of the script in 1834. 
This is not to invalidate semiotic transmission experiments generally, however a a negative finding may suggest that not all cultural items will change in the process of transmission. This would speak to the trade-off between the set and the individual items within the set: as the set itself becomes compressed it potentially relieves the compression pressure on individual items.

External pressures

An alternative explanation is that the system is not optimally compressed but that outside sources of inertia are acting on it and preventing compression effects from arising. When writing systems are taught in institutional settings, for example, there is an incentive for standardisation and inertia. After all, it is more economical to teach a classroom of students a single system with the use of common reference materials than multiple variations. The Cherokee script was codified and standardised in print very shortly after its invention and it can be observed that Cherokee graphemes have changed very little over their history.
As far as we know from the historical record a school existed for teaching Vai as early as ca. 1835, but was destroyed in war after a mere 18 months and was never rebuilt (Koelle 1854, Migeod 1909). The next observational evidence for schools is from the 1860s (Creswick 1868) but by 1899 Maurice Delafosse reported that there were no longer any schools and that the script was taught from father to children or through voluntary apprenticeship to a competent scribe. In about the same period, the Vai script was introduced into one Christian school at Robertsport "for the first time" (Massaquoi 1899, 579). As such, it would appear that institutions were never crucial in the transmission of Vai and are therefore unlikely to have exerted too much conservative pressure, however the full extent of institutional mechanisms may simply have escaped notice in the historical record. As Scribner and Cole put it: "We do not know if the script is ever used in bush schools or secret society activities, since we do not have access to information about how these institutions conduct their affairs" (Scribner and Cole 1981). Thus a negative finding in our study may point to historical and ethnographic factors that cannot be recuperated.

Pragmatic constraints

A major limitation of graphic communication across different time frames is that misunderstandings can never be repaired in on the spot (REF TopICs paper). Thus, it could be the case that the necessity of maintaining a highly consistent and conventional shared code exerts too much inertia on the system, and no significant modification will be permitted. We know from other historical contexts that apparently non-optimal writing systems still manage to be transmitted with high fidelity. Eg, writing systems that have enormous redundancy and graphic complexity (REF Kelly 2016) or those that are borrowed from elsewhere and whose typologies are ill-suited to the phonology of the new language (Frost 2012, inter alia).

Literacy 

All experimental participants will already be literate in the Roman script and perhaps others. This means that the experiments and the historical data are not directly comparable on the dimension of literacy. Consequently, changes to the Vai input across experimental generations may be conditioned by prior literacy in another linear script, thus items may become more Romanised over time. In itself, this does not undermine the premise since our study is interested in detecting compression regardless how that compression is actualised. Nonetheless, the prior literacy of experimental participants represents an important contrast with the historical reality.

Concluding remarks and impressions

If Vai is shown to evolve towards a compressible system we may speculate as to how informative this may be for understanding the early evolution of writing. Early Vai certainly shares characteristics of the four origination events we have mentioned earlier: widespread logography with iconicity, variable orientation, seria continua etc.
Graphemes will become more similar to one another. In other words, these items will come to be be generated from the same set of rules, resulting in typographic stereotypes (as for example, ‘stems’ in the Roman script ‘bowls’ and ‘lunettes’ in the Arabic script etc). Within character shapes, any repetitious forms will become abbreviated. In other words, standard repetitions will come to be inferred.
Speed of change

Acknowledgments

The following institutions and individuals helped us secure rare manuscripts for our dataset: Asien-Afrika-Institut (University of Hamburg), Hella Bruns (Max Planck Institute for the Science of Human History, Jena), Valerie Haeder (Library of Congress, Washington). We consulted the Indiana University Liberian Collections, but no dateable and digitised Vai manuscripts were available. The primary Vai data was tabulated by Olena Tykhostup (Freidrich Schiller University, Jena). Lisa Jeschke compiled the German words and sentences used in the Bilerian experiment and Thomas Müller contributed to an earlier draft of it. Michelle O'Reilly was consulted for graphic design and Julia Bespamyatnykh traced the Vai graphemes as vectors. Those who helped test earlier versions of the task were Julia Belger, Julia Bespamyanikh, Marcel Keller, Ron Hübler. Thanks to Volker Gast and Sally Dixon at Friedrich Schiller University for their practical assistance.