3.1 | Sequencing quality, SNP loci and genetic
diversity
Over 817 million 70 bp reads were generated, with an average of
8,171,812 reads per individual after de-multiplexing and length
trimming. Individuals Ec641 and Ec642 from the Volga River V01 sampling
locality were removed due to low sequencing coverage. The coverage for
all remaining individuals ranged from 10.61x to 20.53x. A total of
6,555,128 loci were retained for genotyping, resulting in 1,405,998
variant sites among all individuals.
The combined river dataset, which incorporated all ten sampling
locations in both the Volga and Meramec rivers, produced a mean of
30,236 variant SNP loci per locality (Table 1A). The overall average
number of private alleles detected among SNP loci across the sampled
localities was 1,673. Localities within the Meramec River exhibited a
higher average number of private alleles, 2,191, in comparison to the
Volga River localities, 1,154 (Table 1A). The Meramec River localities
also exhibited greater genetic diversity (HO and Π) than
the Volga River localities. Localities V01 and MO3 had a larger number
of private alleles as compared to the number observed in other
localities within each river system (Table 1A). However, the
measurements of genetic diversity for both localities were more similar
to the values observed in the other localities within each system.
Within the Volga-only dataset, an average of 16,410 variant SNP loci
were identified among the five sampled localities with an average of
1,626 private alleles (Table 1B). The genetic diversity estimates for
the localities based on this dataset were much higher than those
observed in the combined river dataset. Locality V01 again had a
substantially higher number of private alleles with a total of 3,496
(Table 1B). However, again the diversity estimates for VO1 were similar
to the values observed in the other Volga River localities (Table 1B).
The Meramec-only dataset had an average of 27,061 SNP loci across the
five localities, with an average of 2,424 private alleles (Table 1C).
The diversity estimates in this dataset were higher than observed in the
combined river dataset, but not as great of a difference as observed in
the Volga River. Again, locality MO3 had the greatest number of private
alleles, but had estimates of genetic diversity that were more similar
to the other Meramec River localities.
There was no evidence of inbreeding in any of the datasets (Table 1).
The variation in the number of SNP loci, private allele counts and
genetic diversity estimates between the combined and independent river
datasets is attributed to the STACKS populations ‘-r 0.80’ flag.
The command dictated the inclusion of loci only when present within 80%
of individuals and therefore was dependent on the localities specified
during the generation of the datasets.