2.4.1 Population genomic analyses of Pool-seq data
PoPoolation v1.2.2 (Kofler, Orozco-terWengel, et al., 2011) was used to estimate nucleotide diversity (π ; Tajima, 1983), Watterson’s theta (Ɵ ; Watterson, 1975) and Tajima’s D (T D; Tajima, 1989) separately for each pool. Estimates of π andT D from Pool-seq data are sensitive to sequencing errors and variation in coverage (Kofler, Orozco-terWengel, et al., 2011). Therefore, we subsampled mpileup files per pool to uniform depths that were chosen based on the mode of the read depth histogram for each pool (using a minimum depth of 0.5*mode and a maximum depth of mode+0.5*mode for subsampling; see Kurland et al., 2019). Subsampling was done without replacement using the ‘subsample-pileup.pl’ script implemented in PoPoolation v1.2.2 (Kofler, Orozco-terWengel, et al., 2011). The ‘variance-sliding.pl’ script of PoPoolation v1.2.2 (Kofler, Orozco-terWengel, et al., 2011) was used to estimate π(including invariant sites) and T D. Estimations were done for non-overlapping 5 kilo base pair (kb) windows across the assembly, with a minor allele count of 2 for a SNP to be called and applying the same depth thresholds as for the subsampling. Finally, only windows covered to ≥80% with data within the depth thresholds were used. We estimated π and T D using only the sites retained in both pools. All retained sites with 20X to 150X depth of coverage were used to estimateƟ with the same script and parameters as described for πand T D but without subsampling.
PoPoolation2 v1201 (Kofler, Pandey, et al., 2011) was used to calculate F ST(Nei, 1973) between the two pools using the ‘fst-sliding.pl’ script. A minor allele count of 3 was applied for SNP calling. We only retained variant positions with 20X to 150X depth of coverage, which is supposed to remove paralogous regions (inflated coverage) and regions represented by a small number of individuals (Kofler, Pandey, et al., 2011; their supporting information 1). We estimated F STbetween the demes per 1 bp site, as well as for larger, non-overlapping windows of 100 bp, 1, 5, 10, and 20 kb sizes to minimize stochastic errors linked to small window sizes (Kofler, Pandey, et al., 2011). Windows were only included if the fraction of the window covered with data was ≥0.8. In addition to these genome-wide F STcalculations, the gene annotation was transferred onto the synchronized file using ‘create‐genewise‐sync.pl’. F ST was calculated per gene using the ‘fst-sliding.pl’ script with the same settings mentioned above, with a larger window size (1,000,000 bp) than the length of the largest gene present in the genome, following the PoPoolation authors’ recommendation. For subsequent analyses we focused on windows of 1 bp and 5 kb for estimating diversity, divergence, and for performing outlier analyses. Genome-wide divergence patterns were visualized with a Manhattan plot created from F STestimates using the R package qqman (Turner, 2014). Confidence intervals (95%) of statistical parameters were calculated in R-studio v3.5.1 (R Core Team, 2018).