Processing of Genomic Data
Genome-wide single nucleotide polymorphisms (SNPs) were called from the
GBS data using Stacks v1.48 (Catchen, Hohenlohe, Bassham, Amores, &
Cresko, 2013). To generate a large set of reference loci (ustacks), a
minimum depth of three reads was used to construct stacks of loci and
two mismatches were allowed between two stacks (alleles). To build the
catalog of loci among individuals (cstacks), two mismatches were allowed
between loci. The loci were then filtered using the following criteria:
loci were removed if (1) present in fewer than five populations, or (2)
present in fewer than 30% of the individuals in a population. Before
calling genotypes, further criteria were used to remove loci. First, the
loci where more than 75% of individuals had a confounded match
(multiple loci match to single catalog locus in an individual), or with
likelihood less than -8.0, were removed. Second, SNPs with an error rate
higher than 0.1 were dropped, with the error rate determined by the
three replicated individuals. This resulted in a semi-filtered
dataset comprised of 311 individuals and 21,238 SNPs. In order to
reduce any potential confounding effects from linkage disequilibrium or
selection among SNPs, we further filtered this dataset such that only
one SNP with the least missing data was retained for each GBS locus, and
SNPs were removed that violated Hardy-Weinberg equilibrium at the
population level based on a p-value of 0.001. This finalfully-filtered dataset comprised 311 individuals and
12,498 SNPs, and is used for downstream analyses that require
independent loci.