Deletion and duplication breakpoints bioinformatic analysis
DNA intervals ranging from 10bp to 50bp centered on each 5’ and 3’ breakpoint of deletion and duplication were in silico screened for microhomologies, repetitive elements, non-B DNA, secondary structures and recombinogenic DNA motifs. These elements constitute a heterogeneous group of sequences with different lengths (3-18bp), that may act as stimulators for DSB, triggering an incorrect DNA repair/DNA replication leading to non-allelic recombination. The study was performed using the Human Reference Genome (GRCh38) [NC_000023.11: 31641233-32372273 downloaded 5-Sep-2018 from the NCBI website, www.ncbi.nlm.nih.gov/] and was based on a recently reported strategy (Abelleyro et al., 2020). For this analysis, DSB stimulation motifs that showed significant Expected values (E-values <0.05) in random points from the referred study were considered (Abelleyro et al., 2020). Bioinformatic analysis was mainly achieved using SeqBuilder and MegAlign programs [LaserGene DNA Star], ClustalW algorithm [www.ebi.ac.uk/Tools/msa/clustalw2/] and BLAST algorithm [blast.ncbi.nlm.nih.gov/Blast.cgi]. The RepeatMasker algorithm and Dfam [www.dfam.org/] were used to identify repetitive elements. Analysis of non-B DNA sequences was achieved by the non-B DNA motif search tool (nBMST) [nonb-abcc.ncifcrf.gov/apps/nBMST/default/] and confirmed by RepeatAround [portugene.com/repeataround.html] and QGRS mapper [bioinformatics.ramapo.edu/QGRS/analyze.php]. Secondary structure modelling was depicted using mfold [unafold.rna.albany.edu/?q=mfold]. Finally, among the recombinogenic motifs screened using SeqBuilder [LaserGene DNA Star], are included Scaffold Attachment Region (SAR), Ig heavy chain switch and hexanucleotide motifs targeted by the endonuclease/retro-transcriptase of mammalian retroposons (Jurka motifs) (Jurka, 1997).