SNP discovery
Raw sequences reads (Haenel et al. 2019b, 2021) were parsed by library
(pool or individual) and aligned to the third-generation stickleback
reference genome assembly (Glazer et al. 2015) by using Novoalign
(Version 4.0,http://www.novocraft.com/products/novoalign/;
alignment settings provided in the Supplementary Codes). From the
alignments, we derived nucleotide counts (pileups) for all genome-wide
positions by using the pileup function from the RsamtoolsR package (Morgan et al. 2017; unless specified otherwise, all analyses
were implemented with the R language; R Development Core Team, 2019).
Single-nucleotide polymorphisms (SNPs) were then ascertained in two
ways: for an initial exploration of population structure among our
marine and freshwater samples, we used the pileup data derived from
indSeq. Genomic positions qualified as SNPs if the minor allele
frequency (MAF) was at least 0.04 across the 24 marine individuals (thus
excluding positions appearing variable due to sequencing error only); if
cumulative read depth across the marine fish was no greater than 1000
(thus effectively eliminating repeated genomic elements); if all 44
stickleback individuals displayed at least 1x read depth (thus excluding
positions with missing data); and if the physical distance to the
nearest SNP was at least 100 bp (thus ruling out SNP clusters caused by
micro-indels). This stringent quality filtering resulted in our ‘indSeq
SNPs’ including 1.65 million markers across the 447 Mb stickleback
genome. Analyses based on an alternative SNP panel (1.61 million SNPs)
obtained by applying the MAF and cumulative read depth threshold to the
20 freshwater instead of the marine individuals consistently produced
similar results (details not reported).
For the discovery of genetic variation important to acidic adaptation
and the subsequent exploration of SGV, SNPs were ascertained based on
the poolSeq data from the acidic and basic fish. We here required a read
depth between 100 and 500x and a MAF of at least 0.25 across the two
pools combined, and a read depth of at least 50x within each pool. The
1.5 million ‘poolSeq SNPs’ passing these filters were genotyped in all
freshwater and marine population pools separately.