4. Discussion
Despite the increase in large-scale genomic data, PCR-RFLPs are still
widely used as diagnostic markers for the detection and species
assignment of parasites (Pegg et al., 2016), disease-causing pathogens
(Kato et al., 2019), microbiota (Baffoni et al., 2013), toxic
dinoflagellates (Lozano-Duque, Richlen, Smith, Anderson, & Erdner,
2018) as well as animals using different tissue samples (Larraín,
González, Pérez, & Araneda, 2019), scat samples (Mukherjee, Cn, Home,
& Ramakrishnan, 2010) or environmental DNA (eDNA) (Clusa, Ardura,
Fernández, Roca, & García-Vázquez, 2017). Once markers are identified
it is a fast, cheap, and reliable technique, but the design of PCR-RFLP
markers is usually time consuming, especially if many species and
populations are being compared and/or highly differentiated markers are
difficult to find. Here, we introduce a streamlined workflow to identify
PCR-RFLPs from whole genome re-sequencing data (GB-RFLPs). We note that
the same approach could be applied to RAD-seq, exome sequencing, or
other forms of targeted genomic data (Fig. 2).
Our study yielded promising results for diverged populations from
different lakes without ongoing gene flow, as represented by all seven
Nicaraguan crater lakes. While populations could be assigned with more
than 90% accuracy to two crater lakes (Apoyeque and As. Managua), our
markers even yielded 100% assignment accuracy for populations of the
remaining five crater lakes (Table S4). Results were less clear for
populations with ongoing gene flow and/or large population sizes, in
particular the Great Lakes Nicaragua and Managua, for which
population-specific markers performed poorly (between 62 and 86%
assignment accuracy, Table S4). This was not unexpected as we know that
many alleles are shared between the great lakes and chances are high
that alleles found in one of the great lakes can at least be found in
one of the crater lakes that was colonized from this older source
population. Therefore, although we could assign individual samples using
whole-genome (Kautt et al., 2020) or RAD-seq data (Kautt et al., 2018),
single- or double marker approaches are not sufficient to unambiguously
differentiate between Lake Managua or Lake Nicaragua Midas cichlid
populations. A similar problem can be observed for the species–specific
markers for the species of Crater Lakes Apoyo and Xiloá. Also here,
species clearly form pronounced clusters using whole-genome (Kautt et
al., 2020) or RAD-seq marker sets (Kautt et al., 2016). Yet,
particularly in the sympatric scenario, where speciation occurred within
the last 5,000 years (Kautt et al., 2020) and in at least one case gene
flow persists (Kautt et al., 2020; Kautt et al., 2018), there might be a
strong ascertainment bias when focusing on single SNPs — as it has
been intensively discussed for SNP datasets from humans (Clark, Hubisz,
Bustamante, Williamson, & Nielsen, 2005). In line with this caveat,
indeed species-specific markers, with a few exceptions (A.
chancho and A. viridis ), performed less reliably (12/14 markers
have <90% correct assignments; Table S4). Interestingly, the
genetic markers for the great lake species that show extremely low
genetic differentiation (FST~0.02) perform quite well
(87% and 81% correctly assigned), particularly when combined (100%
correctly assigned) (Fig. S4). This can be explained by the different
approach that was taken here. We designed markers based on the cognizant
of our prior knowledge of the genomic basis of the species-defining
trait of A. labiatus : hypertrophied, thick lips. As the trait and
the underlying associated SNPs (lip size variation links to only a
single locus in most populations; (Kautt et al., 2020)) are almost
alternatively fixed between these species, the marker seem to be most
powerful to reliably assign species. While signals for gene flow betweenA. labiatus and A. citrinellus can be detected in most of
the genome, this is not true for the lip locus on chromosome 8, where
also the genetic markers are located.
Based on our results, we conclude that the design of markers based on
whole genome data is a powerful approach in an effort to distinguish
clearly differentiated species or populations or rare cases where we
have loci with high local differentiation that can be used as markers.
For populations with ongoing gene flow or instances where the population
constitutes the source population (both applies for Great Lakes
Nicaragua and Managua) the single/double-GB-RFLP marker approach
performs poorly — likely because our genomic samples that we used for
the design of the markers gives only an estimate of the ‘true’
population allele frequencies (i.e., markers that seem perfect based on
our limited genomic data are in reality not markers that can
unambiguously differentiate populations). The same is true for sympatric
species (Crater Lake Apoyo and Xiloá) without localized differentiation
(as opposed to differentiation found between A. labiatus andA. citrinellus ). To make reliable species identification
possible, multi-marker assays might be necessary for some instances.
These would likely not require the complete set of markers found via
RAD-seq or WGS analyses but could be applied with a selected set of
markers. Here, one approach would be to use those SNPs that load most
heavily on the first principal components of Crater Lakes Apoyo and
Xiloá (based on Kautt et al., 2020; Kautt et al., 2016, 2018) thereby
giving most power to distinguish the sympatric species. Such very
cost-effective targeted multi-SNP genotyping panels have been used, for
example, for 217 SNPs to assign salmons to particular populations
(Aykanat, Lindqvist, Pritchard, & Primmer, 2016) and might be an
excellent approach, also for the Midas cichlid system. Lastly, this set
of RFLPs is now available as a resource for conservation purposes to for
example identify individual samples on fish markets, but also for cohort
and mark-recapture studies. This study therefore also presents a
workflow how use genomic resources for the generation of applicable
low-budget approaches for species assignment. Our study therefore
introduces a new methodological approach for such an effort, as
implementation of approaches that can help ‘real-world conservation
issues’ often fail as previously discussed (Shafer et al., 2015).
In summary, we tested a set of 36 PCR-RFLP loci that we designed based
on whole genome re-sequencing data to genetically assign Midas cichlid
species and populations. While our analyses reveal limitations for the
assignment of species and populations with ongoing gene flow and/or
extreme recent divergence, genome-based designed PCR-RFLPs (GB-RFLP)
have great benefits when populations with robust genome-wide (allopatric
populations) or local differentiation (A. citrinellus andA. labiatus ) have to be identified.