2.7 Estimation of RFLP marker quality
To estimate the reliability of individual markers, we first calculated
discovery and false discovery rates based on allele frequencies. To do
so, we first calculated allele frequencies by lake (lake comparisons) or
sympatric species (species comparisons). Based on the allele frequency,
we calculated expected genotype frequencies for all populations
individually (assuming Hardy–Weinberg equilibrium) (Table S2). To
evaluate the quality of the markers, we used these frequencies to
calculate the chance that an individual with a particular genotype is
from a particular population or not (without specifying which and
assuming a 50:50 chance that the individual is from the focal population
or not) (Table S2).
We used a bootstrap approach and randomly picked one million genotypes
(i.e. individuals) from the ingroup (target population) and one million
from the outgroups (again with an equal chance for each
population/sympatric species to be picked) based on their relative
frequencies and calculated how often a particular population/sympatric
species would have been assigned correctly (correctly assigned), how
often an ingroup individual would have been assigned to an outgroup
(false negative) and how often an outgroup individual would have been
assigned to the ingroup (false positive) (Fig. 3, Table S3). The same
approach was then used based on our PCR-RFPL data (Table S3): False
negatives were ingroup individuals that were incorrectly assigned as
outgroup individuals, false positives were outgroup individuals that
were incorrectly assigned as ingroup individuals. The proportion of
correctly assigned individuals was calculated by taking the mean of the
percentage of correctly assigned ingroup and the percentage of correctly
assigned outgroup individuals (to make these estimates comparable to the
estimates based on the bootstrapping dataset — some analyses were
imbalanced with a different number of ingroup and outgroup individuals).