Generation of allele distribution models:
In order to generate the allele distribution model for ROHMM we
utilized the hemizygosity of male X chromosome non-pseudoautosomal
region. Hemizygosity of haploid genomes have been used to detect
sequencing errors by others (Li, 2014) whereas BAF distribution of male
X have been shown to mimic that of homozygous regions within autosomes
by Magi et. al. (Magi et al., 2014). We hypothesized allele distribution
of male X chromosome non-pseudoautosomal region should be able to infer
any long runs of homozygosity using a 2-state Hidden Markov Model. BAF
distributions from X chromosome non-pseudoautosomal regions were
collected from 25 whole genome samples (~30x coverage)
from 1000G project phase3 using all valid biallelic SNP locations
included in the latest version of gnomAD 2.1.1 dataset (Karczewski et
al., 2020). Through this collection we compared male and female
non-pseudoautosomal regions of X against the BAF distribution of the
whole genome and homozygous regions determined using 1000G omni2.5 array
data and PLINK software. Comparison indicated a high level of
correlation between homozygosity and male X chromosome where as female X
chromosome was in high concordance with whole genome distribution of BAF
(Figure 1).