Non-B DNA-forming repeat motifs associated with deletions
We wished to ascertain whether the short deletions and long deletions
were associated with different types of repeat motifs. Six types of
non-B DNA-forming repeat, DR, GQ, IR, MR, Z-DNA, and STR, were
investigated in this study. For each type of repeat, we obtained the top
10 most frequent sequences occurring in the vicinity of breakpoints of
deletions with length >30 bp or \(\leq\)30 bp (Figure S3).
Interestingly, most repeat motifs occurring in the vicinity of short
deletions were different from the repeat motifs occurring in the
vicinity of the long deletions (Figure S3). For DR (Figure S3A and B),
all of the top 10 repeat motifs in deletions >30 bp were
single nucleotide repeats whereas in deletions \(\leq\)30 bp, only one
of the top 10 repeats in DR was a single base repeat. Meanwhile, for MR,
we observed six single nucleotide repeat motifs (all motifs were
nucleotide poly-A repeats) among the deletions >30 bp
whereas only three single nucleotide repeats were found in the deletions\(\leq\)30 bp (Figure S3 E and F). Thus, there may be a preference for
single nucleotide repeats [poly A, poly T, poly C, or poly G] around
deletion breakpoints ≥30 bp. From Figure S3I and J, we can see that
seven of the top 10 repeat motifs occurring in STR are shared between
the long deletions and the short deletions. We also noted that the
sequence preference of Z-DNA repeats in long deletions is similar to the
sequence preference associated with short deletions (Figure S3K and L).
The underlying reason may be that for the STR and Z-DNA repeats, the
cut-off in terms of partitioning the deletions into short and long
groups does not lie around 30 bp (Figure S2). Frequencies of Z-DNA
repeats were not found to correlate with the deletion length. When Z-DNA
was divided into two groups according to deletion length, a frequency
peak was observed at the breakpoints (Figure S4F) of long deletions
(length >20 bp) but not at the breakpoints of short
deletions (≤20 bp). Thus, if we use the frequency of Z-DNA to define the
gross deletions, 20 bp may be the appropriate cut-off.