Stickleback samples, DNA library preparation and sequencing
A precondition for our analysis of SGV in marine stickleback was the
initial identification of genetic polymorphisms important to acidic
adaptation. For this, we considered five acidic and five basic lakes
from North Uist from which individual DNA was already available (Haenel
et al. 2019) (Figure 1b, Table S1). We refer to the latter habitat type
as ‘basic’ for terminological consistency with our previous work, but
emphasize that the fish inhabiting these lakes represent the standard
freshwater stickleback ecomorph wide-spread across G. aculeatus ’
range. We chose 20 individuals from each of these freshwater populations
at random and combined their DNA to equal molarity without
PCR-enrichment into either an acidic or a basic pool of 100 individuals
each. The goal of this pooling (and the subsequent pooled sequencing,
hereafter poolSeq) was to obtain relatively precise allele frequency
estimates in acidic versus basic stickleback in general, while ignoring
allele frequencies within each specific population. To nevertheless have
access to individual genotypes and haplotype information, we
additionally chose two individuals from each acidic and basic population
at random for individual sequencing (indSeq).
To allow exploring the extent to which adaptive genetic variation
discovered in freshwater fish is present as SGV in marine stickleback,
we focused on samples from six locations across the Atlantic Ocean:
North Uist (NU), Ireland (IR), The Netherlands (NL), Germany (DE),
Iceland (IS) and Eastern Canada (CA) (Figure 1b, Table S1; note that
North Uist subsumes two nearby marine sample sites, ARDH and OBSM). From
each of these marine locations, we aimed for a sample size of around 25
individuals. Except for North Uist, from which marine individual-level
whole-genome sequence data were already available (Haenel et al. 2019),
individual DNA was extracted using the Quick-DNA TM Miniprep Plus Kit
(Zymo Research, Irvine, CA, USA). For the estimation of population
allele frequencies via poolSeq, individual DNA was then combined to
equal molarity without PCR-enrichment within each of the five new
locations. In addition, four individuals from each of these locations
were chosen at random for indSeq (Table S1).
The 47 total DNA libraries (7 pools and 40 individuals) were paired-end
sequenced to 150 base pairs on an S4 flow cell of an Illumina NovaSeq
6000 instrument, producing a genome-wide median read depth per base pair
of 85x on average across the pools, and of 16x across the individuals
(details given in Table S1).