2.4.1 Population genomic analyses of Pool-seq data
PoPoolation v1.2.2 (Kofler,
Orozco-terWengel, et al., 2011) was used to estimate nucleotide
diversity (π ; Tajima,
1983), Watterson’s theta (Ɵ ;
Watterson, 1975) and Tajima’s D
(T D; Tajima, 1989)
separately for each pool. Estimates of π andT D from Pool-seq data are sensitive to sequencing
errors and variation in coverage (Kofler,
Orozco-terWengel, et al., 2011). Therefore, we subsampled mpileup files
per pool to uniform depths that were chosen based on the mode of the
read depth histogram for each pool (using
a minimum depth of 0.5*mode and a maximum depth of mode+0.5*mode for
subsampling; see Kurland et al., 2019). Subsampling was done without
replacement using the ‘subsample-pileup.pl’ script implemented in
PoPoolation v1.2.2 (Kofler,
Orozco-terWengel, et al., 2011). The ‘variance-sliding.pl’ script of
PoPoolation v1.2.2 (Kofler,
Orozco-terWengel, et al., 2011) was used to estimate π(including invariant sites) and T D. Estimations
were done for non-overlapping 5 kilo base pair (kb) windows across the
assembly, with a minor allele count of 2 for a SNP to be called and
applying the same depth thresholds as for the subsampling. Finally, only
windows covered to ≥80% with data within the depth thresholds were
used. We estimated π and T D using only the
sites retained in both pools. All
retained sites with 20X to 150X depth of coverage were used to estimateƟ with the same script and parameters as described for πand T D but without subsampling.
PoPoolation2 v1201 (Kofler, Pandey, et
al., 2011) was used to calculate F ST(Nei, 1973) between the two pools using
the ‘fst-sliding.pl’ script. A minor allele count of 3 was applied for
SNP calling. We only retained variant positions with 20X to 150X depth
of coverage, which is supposed to remove paralogous regions (inflated
coverage) and regions represented by a small number of individuals
(Kofler, Pandey, et al., 2011; their
supporting information 1). We estimated F STbetween the demes per 1 bp site, as well as for larger, non-overlapping
windows of 100 bp, 1, 5, 10, and 20 kb sizes to minimize stochastic
errors linked to small window sizes
(Kofler, Pandey, et al., 2011). Windows
were only included if the fraction of the window covered with data was
≥0.8. In addition to these genome-wide F STcalculations, the gene annotation was transferred onto the synchronized
file using ‘create‐genewise‐sync.pl’. F ST was
calculated per gene using the ‘fst-sliding.pl’ script with the same
settings mentioned above, with a larger window size (1,000,000 bp) than
the length of the largest gene present in the genome, following the
PoPoolation authors’ recommendation. For subsequent analyses we focused
on windows of 1 bp and 5 kb for estimating diversity, divergence, and
for performing outlier analyses. Genome-wide divergence patterns were
visualized with a Manhattan plot created from F STestimates using the R package qqman
(Turner, 2014). Confidence intervals
(95%) of statistical parameters were calculated in R-studio v3.5.1
(R Core Team, 2018).