Introduction:
Runs of homozygosity (ROH) are long genomic stretches of homozygous
genotypes particularly due to high consanguinity or inbreeding although
they have also been observed in outbred populations (Cavalli-Sforza &
Bodmer, 1999; Gibson, Morton, & Collins, 2006). It is known that ROH’s
contain much of the information related to recessive traits that help
clinicians and researchers to correlate genotype – phenotype
associations with respect to disease and population genetics (Bittles &
Black, 2010; Ceballos, Joshi, Clark, Ramsay, & Wilson, 2018).
Advancements in next generation sequencing (NGS) and availability to
masses further accelerated the gene-disease associations and made
homozygosity mapping using massively parallel sequencing data preferable
to classical laborious STR mapping methods (Alsalem, Halees, Anazi,
Alshamekh, & Alkuraya, 2013; Ceballos, Joshi, et al., 2018; Chahrour et
al., 2012; Pippucci et al., 2013; Walsh et al., 2010).
Here we propose a strategy to estimate homozygous segments from error
prone high density genotyping data especially from whole genome and
whole exome sequencing. Our algorithm uses the HMM (Hidden Markov Model)
approach with modifications. ROHMM ’s dynamic HMM uses an
observable pattern of hemizygosity in male X chromosomes as a model for
homozygosity along the genome and female X chromosomes as a model for
heterozygous segments. Allelic distances were also incorporated into the
dynamic HMM algorithm as in BioHMM and H3M2 , where the
latter uses the former’s exact algorithm (Magi et al., 2014; Marioni,
Thorne, & Tavaré, 2006). We compared ROHMM to its natural competitorsH3M2 , bcftools roh and PLINK in terms of feature
set in Table 1. The ROHMM software includes many enhancements to
eliminate the need for different tools to filter and select the best
data representing the sample set. ROHMM also lets users set their
best estimator parameters freely compared to any other tools present.ROHMM is coded purely in Java and available from the github
repository as a source code and compiled binaries.