3.1 | Sequencing quality, SNP loci and genetic diversity
Over 817 million 70 bp reads were generated, with an average of 8,171,812 reads per individual after de-multiplexing and length trimming. Individuals Ec641 and Ec642 from the Volga River V01 sampling locality were removed due to low sequencing coverage. The coverage for all remaining individuals ranged from 10.61x to 20.53x. A total of 6,555,128 loci were retained for genotyping, resulting in 1,405,998 variant sites among all individuals.
The combined river dataset, which incorporated all ten sampling locations in both the Volga and Meramec rivers, produced a mean of 30,236 variant SNP loci per locality (Table 1A). The overall average number of private alleles detected among SNP loci across the sampled localities was 1,673. Localities within the Meramec River exhibited a higher average number of private alleles, 2,191, in comparison to the Volga River localities, 1,154 (Table 1A). The Meramec River localities also exhibited greater genetic diversity (HO and Π) than the Volga River localities. Localities V01 and MO3 had a larger number of private alleles as compared to the number observed in other localities within each river system (Table 1A). However, the measurements of genetic diversity for both localities were more similar to the values observed in the other localities within each system.
Within the Volga-only dataset, an average of 16,410 variant SNP loci were identified among the five sampled localities with an average of 1,626 private alleles (Table 1B). The genetic diversity estimates for the localities based on this dataset were much higher than those observed in the combined river dataset. Locality V01 again had a substantially higher number of private alleles with a total of 3,496 (Table 1B). However, again the diversity estimates for VO1 were similar to the values observed in the other Volga River localities (Table 1B). The Meramec-only dataset had an average of 27,061 SNP loci across the five localities, with an average of 2,424 private alleles (Table 1C). The diversity estimates in this dataset were higher than observed in the combined river dataset, but not as great of a difference as observed in the Volga River. Again, locality MO3 had the greatest number of private alleles, but had estimates of genetic diversity that were more similar to the other Meramec River localities.
There was no evidence of inbreeding in any of the datasets (Table 1). The variation in the number of SNP loci, private allele counts and genetic diversity estimates between the combined and independent river datasets is attributed to the STACKS populations ‘-r 0.80’ flag. The command dictated the inclusion of loci only when present within 80% of individuals and therefore was dependent on the localities specified during the generation of the datasets.