Introduction:
Runs of homozygosity (ROH) are long genomic stretches of homozygous genotypes particularly due to high consanguinity or inbreeding although they have also been observed in outbred populations (Cavalli-Sforza & Bodmer, 1999; Gibson, Morton, & Collins, 2006). It is known that ROH’s contain much of the information related to recessive traits that help clinicians and researchers to correlate genotype – phenotype associations with respect to disease and population genetics (Bittles & Black, 2010; Ceballos, Joshi, Clark, Ramsay, & Wilson, 2018). Advancements in next generation sequencing (NGS) and availability to masses further accelerated the gene-disease associations and made homozygosity mapping using massively parallel sequencing data preferable to classical laborious STR mapping methods (Alsalem, Halees, Anazi, Alshamekh, & Alkuraya, 2013; Ceballos, Joshi, et al., 2018; Chahrour et al., 2012; Pippucci et al., 2013; Walsh et al., 2010).
Here we propose a strategy to estimate homozygous segments from error prone high density genotyping data especially from whole genome and whole exome sequencing. Our algorithm uses the HMM (Hidden Markov Model) approach with modifications. ROHMM ’s dynamic HMM uses an observable pattern of hemizygosity in male X chromosomes as a model for homozygosity along the genome and female X chromosomes as a model for heterozygous segments. Allelic distances were also incorporated into the dynamic HMM algorithm as in BioHMM and H3M2 , where the latter uses the former’s exact algorithm (Magi et al., 2014; Marioni, Thorne, & Tavaré, 2006). We compared ROHMM to its natural competitorsH3M2 , bcftools roh and PLINK in terms of feature set in Table 1. The ROHMM software includes many enhancements to eliminate the need for different tools to filter and select the best data representing the sample set. ROHMM also lets users set their best estimator parameters freely compared to any other tools present.ROHMM is coded purely in Java and available from the github repository as a source code and compiled binaries.