Molecular clock and dating of phylogenies
The concept of a molecular clock is essential to obtain an evolutionary time scale from molecular sequence data. Although traditionally it was the fossil record that provided the authoritative source of information about evolutionary history, phylogenies inferred from molecular sequence data have become the primary framework for reconstructing the evolutionary history of life on Earth. However despite this fact, inferring the timing of phylogenetic divergences on a geological scale still relies heavily on information synthesized from the fossil record. Early examples of using the fossil record in concert with molecular data may have suggested that such analyses are straightforward, however a recent flurry of new approaches demonstrate that the science is not yet settled. In this article I will describe the various models of molecular clock, and the ways they are used in conjunction with data from the fossil record to date the divergence time of molecular phylogenies.
The tree of life on Earth has ramified through untold millennia. The result today is a spectacular richness of living forms. Occasionally conditions in the past have allowed earlier species to leave a record of their form imprinted in the rocks; their bones and hard body parts compressed and transformed by geochemical processes and the vastness of time. Paleontology straddles the disciplines of biology and geology and it is arguably geologists and early paleontologists that first recognized the nature of phylogeny and the tree of life. Geological depictions of the phylogenies of plants and animals appeared in geology textbooks decades before Darwin’s “Origin of species”.
However in recent times evolutionary biology has undergone its most dramatic transformation with the development of technology to sequence genomic data from modern species. The field of statistical phylogenetics has been spurred on by advances in mathematical models and computational methods.
The scientific history of reconciling the considerable body of paleontological knowledge of past species, with genomic data obtained from today’s species has been an often fractious one, with one or the other form of data presumed superior, depending on the practitioner.
Zuckerkandel and Pauling first proposed the existence of a “molecular evolutionary clock”, based on evidence that the rate of evolutionary change of molecular sequences appeared to be very similar per unit time across diverse lineages (Zuckerkandl 1965). Allan Wilson was a pioneer of the application of the molecular clock. One of the first examples of a molecular phylogeny challenging palaeontological evidence came from Wilson and Sarich’s paper entitled “A molecular time scale for human evolution” (Wilson 1969), in which they estimated an age of the common ancestor of humans and chimpanzees of 4-5Myr, far more recent than the figure of 20-30 Mya accepted at the time by palaeontologists.
Nevertheless a few years later Langley and Fitch (Langley 1974) introduced a paper on the molecular clock by saying:
The fundamental conclusion of even the most rudimentary analysis is that amino acid sequence differences correlate well with morphological and paleontological considerations.
Although the authors went on to show evidence that rates of nucleotide substitution are not constant across vertebrates (Langley 1974), they maintained that comparison of their phylogenetic dating of mammals against geological dates showed a remarkably good fit.
Evidence for or against the existence of a molecular clock (sometimes known as the ‘rate-constancy hypothesis’) fueled debate between the neutralist and selectionist views of molecular evolution (Kimura 1987).
Kimura (Kimura 1987) put it thus:
From the standpoint of the neutral theory of molecular evolution, it is expected that a universally valid and exact molecular evolutionary clock would exist if, for a given molecule, the mutation rate for neutral alleles per year were exactly equal among all organisms at all times.
However the extent to which data supported the clocklike progression of molecular evolution was examined by a series of empirical studies that showed that the variance from lineage to lineage of empirically measured molecular clocks tended to be greater than would be expected from Poisson error alone (Langley 1974, Kimura 1987).
In this article I will review modern statistical and computational techniques for applying the molecular clock to divergence time dating.
In this section we will cover the main types of molecular clock models that can be applied. The simplest molecular clock model is the strict molecular clock, which assumes a single shared mean rate of evolution for all the branches in the molecular phylogeny.
In this section we describe recently developed mathematical models and computational methods that treat all data under the equanimous gaze of Bayesian probability.
It is important to note here that both molecular sequences and morphological fossils, including in both cases the circumstances of their observation, must be considered data. It is only be treating them uniformly as data that the total evidence can be correctly synthesized into an unbiased picture. I will contrast this to previous attempts at applying Bayesian inference to the problem of dating species divergences with rocks and clocks. In many past attempts fossil data was pre-processed into prior knowledge in inaccurate and biasing ways. In general the concept of “calibrating” and “dating” molecular phylogenies with fossil data has ignored the role of fossil observations as data. By treating fossils as data in an integrated analysis it is also possible to exploit novel new inferences, such as to date fossils based on their morphological affinities in cases where geological dating is very uncertain.