Alright then.
The paper I will be discussing in this presentation is almost certainly one you've already seen, but I will be talking about Bouckaert et al's Mapping The Origins and Expansion of the Indo-European Language Family. This paper was published in September 2012, in the journal Science, if you for some reason haven't already read it, the DOI is in the bottom of each slide so you can Sci-Hub it, or there's a paper copy that I'll leave for you to use.
Right so, let's start by having a quick overview of the Indo-European language family
[ADVANCE TO SLIDE 2]
A lot of linguists would murder me for using a clear cut phylogeny for this slide, so don't take it literally, this slide is just to show the vast numbers of languages, both extinct and extant languages that could be descended from PIE, at the top of the tree, the LCA to all of these languages; stretching from English to Hittite. It truly is breathtaking. So, naturally, there's a lot of interest as to where this LCA was spoken, and how it came to spread and evolve across the continent.
[ADVANCE TO SLIDE 3]
And so that takes us to this slide, the commonly held view of linguists is that proto-indo-european emerged on the Pontic Steppes
[CLICK]
As you can see it's a very large area, roughly incorporating Ukraine, but stretches as far as Kazakhstan on the east and to Romania on the west. The hypothesis attests that the language spread from the steppes kind of like this (obviously over a long period of time), and was first spoken roughly around 4,000 to 5,000 years ago. And from there it spread rapidly due to a number of factors, but mainly the domestication of the horse, which likely happened around this time (approx. 3500 BC).
[ADVANCE TO SLIDE 4]
But there's an alternative hypothesis, as there often is with these types of questions. This hypothesis was put forward about a generation ago (1987) by an archaeologist called Colin Renfrew; who hypothesised that PIE was instead originally spoken in Anatolia.
[CLICK]
Which as you can see encompasses modern day Turkey. It then spread out more slowly from Anatolia, with the spread of neolithic revolution (which was the cultural change of a hunter-gatherer lifestyle, to an agricultural one) [CLICK]
So which hypothesis is right?
[CLICK]
Well that's what this paper set out to prove, or at least, provide evidence for.
[CLICK]
Good question, what did they do? Well, they did all these things, but we'll walk through them one by one. Or relaxed walk through them - That joke will become funnier later.
Well firstly they created a database of cognates to work with. They compiled a database of cognates from across 103 Indo-European languages (including 20 ancient languages) and 207 meanings - Making up a total of nearly 6,000 words.
The meanings include familial terms such as mother or sister, body parts, words for describing the natural world (wind, fire) and basic verbs – words that are universal. The dataset was built up from 113 existing sources, however it mainly comprises of data from the Comparative IndoEuropean Lexical Database. They used expert linguists to make the judgements on certain cognate lists should be included/excluded from the list (for the reason of duplication ie dialects being counted as distinct languages).
They've now published their cognate list online at this site, so you can have a look for fun later if you like.
[CLICK]
So now you've got your cognate data you can build yourself a tree - Now they used a Bayesian inference method of modelling the tree, which honestly is far too complicated for me to get into as it would show I don't fully understand it, because the maths behind it is fundamentally evil. But there are some important takeaways - The model they used allows for variation in the rate of cognate evolution (words evolve at different speeds and that cognates can only be gained once, but can be lost multiple times in descendant languages.
[CLICK]
Alright, now we've got to tweak the tree. As they're going to model expansion of IE out of wherever it started from, they need to establish a rough idea of how quickly languages change. They did this by putting limits on how long certain clades are allowed to be time wise. For example, there is good reason to think that the Romance languages had begun to diverge by the time that the Roman Empire began to break up, so with that in mind, we can constrain the age of the romance clade based on that information.
They also mention in their supplementary paper that one of the advantages of Bayesian analysis is that they don't have to establish fixed constraints, the constraints themselves can vary.
[CLICK]
Step 3! Now, to work out where Indo-European languages have come from, they used information from a publication called Ethnologue about where the present day languages in their sample are (pre-colonially) spoken (given an approximate geographic range, rather than a point location) and where the ancient languages are believed to have been spoken.