Networks for phylogenomics
Abstract of C. Scornavacca presentation at the Cross Disciplinary Genomics Symposium
Phylogenetic analysis is the study of evolution and relationships between different species (or taxa), generally computed from biological sequence data. Trees are the most used datatypes to conceptualize, visualize and analyze the evolution of different biological lineages. In fact, they are well suited for describing the evolutionary pathways of different species, because of them being intuitive as well as easy to query and study from both computational and mathematical point of view. However, trees are impractical when it comes to accommodate for reticulated events such as horizontal gene transfer, hybridization and recombination between lineages, thus the need for novel methods in order to fill this gap. Phylogenetic networks provide such an alternative. The topic that was presented by C. Scornavacca is an overview of how networks can improve phylogenomics.
A phylogenetic network is a connected graph where terminal nodes are associated with biological species or sequences. (including biological species). They can be either explicit or implicit (Huson 2011). It is important to note this distinction, since abstract networks (also called “data visualization graphs”) do not always take into account the evolutionary constraints, therefore making them ill suited for the study of biological phenomena, but still can provide some insight by displaying the data in a differently meaningful way.
Another distinction (that also applies for phylogenetic trees) is whether the network is rooted or unrooted. A rooted tree is a connected and directed acyclic graph where the root of the tree relates to the most ancient common ancestor of the represented species, whereas the unrooted tree only explains how species are related to each other. Networks have been extensively used to model unrooted trees, espacially Neightbor-net, consensus split networks and median-joining. When it comes to rooted trees, those algorithm are ill-suited because of the add biological constraints, not to mention that they are seldom completely defined and optimized, thus not usable as tools at large scales.
There are other means of classifying networks, depending on reconstruction data : from sequences, clusters, distances, trees or splits.
Two important aspects emerged from the presentation in my opinion. The first is that the question is still open and there is still work to do to achieve a complete and relevant method for representing and analyzing phylogeny, the main challenges being successfully modelling reticulated events, network search space optimization and robustness. For this purpose, combinatorial networks have a great potential and have already started to build the foundations of next-generation phylogenomics.
On the other hand, the presentation has indirectly shown how recent advancements in sequence analysis and NGS altered the way we look at evolution, henceforth the reconstruction of the evolution. The more we discover about the molecular biology of DNA and recombination events, the more constraints and variability arise when it comes to formalize them into mathematically exploitable objects.
We have to not only take into account biological relevance, but also the complexity of the algorithms. Though NGS have radically improved our understanding of evolution, it has also caused the amount of data to soar, and it is crucial to keep in mind that speed matters as much as precision and efficiency to be able to exploit all the available information in reasonable time.
Besides, other methods have been proposed, for example (Smith 2013) introduced methods for aligning, synthesizing and analyzing rooted phylogenetic trees.