cis-regulatory modules (CRMs) prediction
To avoid the potential ‘false’ high score to regions with one or more occurrences of the repeated pattern, Tandem Repeats Finder (TRF, v4.07b) (Benson 1999) was used to masked tandem repeats in genome. Then SCRMshaw (Kazemian and Halfon 2019) was run with the Drosophila training data, and with parameters ‘–thitw 2000 –wlen 200 –wshift 100 –gff –imm –hexmcd –pac –genome –traindirlst –outdir’ in the masked genomes. Finally, the needed hits were chosed using Generate_top_N_SCRMhits.pl script, and sorted and merged by BEDTools (Quinlan and Hall 2010). The following comparing SCRMshaw predictions to TEs was carried out still by using BEDTools.