Microhomology analysis for deletions and control1 data
To ascertain microhomologies, we used MHcut, which searches for
homologous sequences within the flanking sequences of deletion variants.
Of the 15,453 deletions with a minimum size of 3 bp, 40% (6,195) were
flanked by microhomologies of at least 3 bp, which is significantly
higher than the corresponding probability (7.3%\(\pm\)0.2%) from
control1 (t-test P-value <2.2E-6). For the remaining
deletions, 59.4% of 1 bp deletions were found with at least 1 bp
flanking microhomologies (control1 28.2%\(\pm\) 0.2%), and 71.3% of 2
bp deletions were detected with at least 2 bp flanking microhomologies
(control1 8.7%\(\pm\)0.1%), implicating microhomologies as a common
enriched characteristic feature of pathogenic deletion breakpoints. When
we divided the pathogenic deletions in the HGMD dataset into two groups
by using 30 bp as a cutoff, we found that the sequence flanking of 42%
deletions with deletions of length <30 bp have microhomologies
while 29% sequence flanking of longer deletions have microhomologies.
The Chi-square test indicated that the short deletions (length
<30 bp) enriched (P-value < 2.2E-16) with
microhomologies comparing to the longer deletions. However, there was no
significant correlation between the frequency of microhomologies and
deletion length.