Generation of allele distribution models:
In order to generate the allele distribution model for ROHMM we utilized the hemizygosity of male X chromosome non-pseudoautosomal region. Hemizygosity of haploid genomes have been used to detect sequencing errors by others (Li, 2014) whereas BAF distribution of male X have been shown to mimic that of homozygous regions within autosomes by Magi et. al. (Magi et al., 2014). We hypothesized allele distribution of male X chromosome non-pseudoautosomal region should be able to infer any long runs of homozygosity using a 2-state Hidden Markov Model. BAF distributions from X chromosome non-pseudoautosomal regions were collected from 25 whole genome samples (~30x coverage) from 1000G project phase3 using all valid biallelic SNP locations included in the latest version of gnomAD 2.1.1 dataset (Karczewski et al., 2020). Through this collection we compared male and female non-pseudoautosomal regions of X against the BAF distribution of the whole genome and homozygous regions determined using 1000G omni2.5 array data and PLINK software. Comparison indicated a high level of correlation between homozygosity and male X chromosome where as female X chromosome was in high concordance with whole genome distribution of BAF (Figure 1).