SNP discovery
Raw sequences reads (Haenel et al. 2019b, 2021) were parsed by library (pool or individual) and aligned to the third-generation stickleback reference genome assembly (Glazer et al. 2015) by using Novoalign (Version 4.0,http://www.novocraft.com/products/novoalign/; alignment settings provided in the Supplementary Codes). From the alignments, we derived nucleotide counts (pileups) for all genome-wide positions by using the pileup function from the RsamtoolsR package (Morgan et al. 2017; unless specified otherwise, all analyses were implemented with the R language; R Development Core Team, 2019). Single-nucleotide polymorphisms (SNPs) were then ascertained in two ways: for an initial exploration of population structure among our marine and freshwater samples, we used the pileup data derived from indSeq. Genomic positions qualified as SNPs if the minor allele frequency (MAF) was at least 0.04 across the 24 marine individuals (thus excluding positions appearing variable due to sequencing error only); if cumulative read depth across the marine fish was no greater than 1000 (thus effectively eliminating repeated genomic elements); if all 44 stickleback individuals displayed at least 1x read depth (thus excluding positions with missing data); and if the physical distance to the nearest SNP was at least 100 bp (thus ruling out SNP clusters caused by micro-indels). This stringent quality filtering resulted in our ‘indSeq SNPs’ including 1.65 million markers across the 447 Mb stickleback genome. Analyses based on an alternative SNP panel (1.61 million SNPs) obtained by applying the MAF and cumulative read depth threshold to the 20 freshwater instead of the marine individuals consistently produced similar results (details not reported).
For the discovery of genetic variation important to acidic adaptation and the subsequent exploration of SGV, SNPs were ascertained based on the poolSeq data from the acidic and basic fish. We here required a read depth between 100 and 500x and a MAF of at least 0.25 across the two pools combined, and a read depth of at least 50x within each pool. The 1.5 million ‘poolSeq SNPs’ passing these filters were genotyped in all freshwater and marine population pools separately.