Specific sequence motifs in deletion flanking sequences
From previous publications(Abeysinghe et al., 2003; Ball et al., 2005;
Chuzhanova et al., 2009), we collected a total of 78 sequence motifs
(Table S4) that have been reported to occur in the vicinity of
deletion/rearrangement breakpoints and are thought to play a role in the
breakage and rejoining of DNA molecules. Briefly, Abeysinghe et
al . (2003)(Abeysinghe et al., 2003) listed 36 sequence motifs known to
be associated with site-specific recombination, mutation, and DNA
cleavage. In their later study, Ball et al. (2005)(Ball et al., 2005)
collected an additional 24 sequence motifs thought to be involved in
site-specific recombination and putative deletion/insertion hotspots.
Finally, Chuzhanova et al. (2009)(Chuzhanova et al., 2009) reported 18
further motifs associated with deletions and recombination. We computed
the frequency for each motif in the 1 kb-long sequences flanking the
pathogenic deletions from the HGMD-deletion dataset and in the control0
dataset using the R package Biostrings(Gentleman & DebRoy, 2019). We
utilized the simulated deletions to determine whether the number of any
type of motif in the vicinity of each breakpoint was higher than
expected by computing an “experience hit” (eH-value), i.e., the number
of times the number of the motifs in the vicinity of the simulated
breakpoints of the control dataset was larger than the number of motifs
in the vicinity of the pathogenic deletion breakpoints, divided by 100.
The relationship between deletion length and motif frequency was then
explored by calculating the average motif frequency for each deletion
length.