cis-regulatory modules (CRMs) prediction
To
avoid the potential ‘false’ high score to regions with one or more
occurrences of the repeated pattern,
Tandem
Repeats Finder (TRF, v4.07b) (Benson 1999) was used to masked tandem
repeats in genome.
Then
SCRMshaw
(Kazemian and Halfon 2019) was run with the Drosophila training
data, and with parameters ‘–thitw 2000 –wlen 200 –wshift 100
–gff –imm –hexmcd –pac –genome –traindirlst
–outdir’ in the masked genomes. Finally, the needed hits were chosed
using Generate_top_N_SCRMhits.pl script, and sorted and merged by
BEDTools (Quinlan and Hall 2010). The following comparing SCRMshaw
predictions to TEs was carried out still by using
BEDTools.