Figures
Figure 1 . Repeat length distribution in all 1-kb bins centered
at the breakpoints of the HGMD-deletion data. “DR”, “GQ”, “IR”,
“MR”, “STR”, and “Z” denote direct repeats, G-quadruplex-forming,
inverted repeats, mirror repeats, short tandem repeats, and Z-DNA,
respectively.
Figure 2. Frequency of non-B DNA forming repeats occurring near
the breakpoints of the HGMD-deletion dataset. X-axis represents the
position relative to the breakpoint and Y axis is the repeat frequency.
A-F is the frequency for direct repeats (DR), inverted repeats (IR),
mirror repeats (MR), G-quadruplex-forming (GQ), short tandem repeats
(STR), and Z DNA sequence, respectively. This frequency refers to the
proportion of sequences with repeats at each location.
Figure 3. Relationship between deletion length and average
non-B DNA-forming repeat frequency. A. The relationship between deletion
length and average repeat frequency within a 1-kb bin of breakpoints. B.
Correlation were observed between deletion length and the average repeat
frequency for each 10-bp bins of deletion lengths. C. Significant
correlations were observed between deletion length and repeat frequency
in 1-kb sequence centered at breakpoints by different cut-offs for
deletions with length ≤9 bp, ≤27 bp, and ≤30 bp, respectively.
Figure 4. Repeats frequency occurring near the breakpoints of
deletions of different length. A-D are the average frequencies of direct
repeats (DR), G-quadruplex-forming (QG), short tandem repeats (STR), and
inverted repeats (IR), respectively.
Figure 5. GC content in the vicinity of breakpoints of
deletions and the relationship between deletion length and GC content.
A. GC content in the vicinity of all the pathogenic deletion breakpoints
and the simulated data. B. Relationship between deletion length and GC
content. When deletion length was less than 38 bp, it was significantly
correlated with GC content (PCC = 0.71 and P-value = 7.3E-7).
Figure 6 . Sequence motifs around the breakpoints of deletions.
A. eH-values for the difference between frequencies of motif occurrence
in 10-bp bins centered at breakpoints of the deletion data and the
simulated data; we found that 16 motifs occurred more frequently
(eH-value < 0.01) in 10 bp bins centered at the breakpoints of
the pathogenic deletion breakpoints than in 10 bp bins centred at the
breakpoints of the control dataset including simulated breakpoints. B.
Relationship between deletion length and average motif frequency; Each
point represents the average motif frequency occurring in the vicinity
of deletions with a certain length.
Figure 7. The Pearson Correlation Coefficient (PCC) and PR
scores for motif frequency, GC content, or repeat frequency against
deletion length. A. Distribution of PCC against deletion length. The PCC
values represent the correlations between deletion length and motif
frequency, GC content, or repeat frequency. B. Relationship between
deletion length and PR score.