Non-B DNA-forming repeat motifs associated with deletions
We wished to ascertain whether the short deletions and long deletions were associated with different types of repeat motifs. Six types of non-B DNA-forming repeat, DR, GQ, IR, MR, Z-DNA, and STR, were investigated in this study. For each type of repeat, we obtained the top 10 most frequent sequences occurring in the vicinity of breakpoints of deletions with length >30 bp or \(\leq\)30 bp (Figure S3). Interestingly, most repeat motifs occurring in the vicinity of short deletions were different from the repeat motifs occurring in the vicinity of the long deletions (Figure S3). For DR (Figure S3A and B), all of the top 10 repeat motifs in deletions >30 bp were single nucleotide repeats whereas in deletions \(\leq\)30 bp, only one of the top 10 repeats in DR was a single base repeat. Meanwhile, for MR, we observed six single nucleotide repeat motifs (all motifs were nucleotide poly-A repeats) among the deletions >30 bp whereas only three single nucleotide repeats were found in the deletions\(\leq\)30 bp (Figure S3 E and F). Thus, there may be a preference for single nucleotide repeats [poly A, poly T, poly C, or poly G] around deletion breakpoints ≥30 bp. From Figure S3I and J, we can see that seven of the top 10 repeat motifs occurring in STR are shared between the long deletions and the short deletions. We also noted that the sequence preference of Z-DNA repeats in long deletions is similar to the sequence preference associated with short deletions (Figure S3K and L). The underlying reason may be that for the STR and Z-DNA repeats, the cut-off in terms of partitioning the deletions into short and long groups does not lie around 30 bp (Figure S2). Frequencies of Z-DNA repeats were not found to correlate with the deletion length. When Z-DNA was divided into two groups according to deletion length, a frequency peak was observed at the breakpoints (Figure S4F) of long deletions (length >20 bp) but not at the breakpoints of short deletions (≤20 bp). Thus, if we use the frequency of Z-DNA to define the gross deletions, 20 bp may be the appropriate cut-off.