2.6 Comparison of results from molecular tests of selection
Comparisons of genome-wide targets of positive selection across species
used two approaches. First, we identified genes within any of the
identified sweep regions, by intersecting the outlier sweep locations
with gene annotations using bedtools intersect, v2.27.1 (Quinlan &
Hall, 2010), for each species genome (Yang et al., 2021). The protein
sequences of any gene, from start to stop, overlapping any sweep region
interval, was then retrieved for each species. We then assessed the
overlap of the sets of proteins identified for the three species using
Orthovenn2 (Xu et al., 2019), which clusters proteins based upon
sequence similarity. This approach provides a means of identifying any
targets of selection that might be shared across the three species,
without also requiring targeting of the same gene region (because this
analysis allows for independent members of a given gene family to be
targeted). In order to assess whether any species appeared to have a
higher proportion of immune genes among the putative targets of positive
selection, we also included the 96 candidate immune genes previously
identified from G. calmariensis in the Orthovenn2 analysis.
Second, we repeated the analysis above, but extended the candidate gene
region to included 5kb on either side of the gene body (i.e. 5 kb before
start and after stop), in order to detect any putative signatures of
positive selection associated with the regulatory regions of a given
gene. We chose a 10kb flanking region targeted for regulatory evolution
based on information from other insect groups (Ghavi-Helm et al., 2014;
Lewis & Reed, 2019).