On the origin and structure of haplotype blocks

Shipilina, Daria; Pal, Arka; Stankowski, Sean; Chan, Yingguang Frank; Barton, Nicholas

doi:10.22541/au.164328881.11613382/v1

loading page

On the origin and structure of haplotype blocks

Shipilina, Daria ,
Stankowski, Sean,
Chan, Yingguang Frank ,
Pal, Arka ,
Barton, Nicholas

Abstract

The term "haplotype block" is commonly used in the developing field of haplotype-based inference methods. We argue that the term should be defined based on the structure of the Ancestral Recombination Graph (ARG), which contains complete information on the ancestry of a sample. We use simulated examples to demonstrate key features of the relation between haplotype blocks and ancestral structure, emphasising the stochasticity of the processes that generate them. Even the simplest cases of neutrality or of a "hard" selective sweep produce a rich structure, which is missed by commonly used statistics. We highlight a number of novel methods for inferring haplotype structure as full ARG, or as a sequence of trees. While some of these new methods are computationally efficient, they still lack features to aid exploration of the haplotype blocks, as we define them, thus calling for the development of new methods. Understanding and applying the concept of the haplotype block will be essential to fully exploit long and linked-read sequencing technologies.