Phylogeographic analysis
Admixture results considering different number of clusters (2-8) [Fig.
S1] provide compelling information about the genetic structure of the
12 sites sampled. When using [K = 2], the sites from Rio Negro and
Rio Solimões are strongly differentiated. Only CAT and JAR, two sites
located downstream from the Rio Negro, have a weak admixture with Rio
Negro sites. When using [K = 3], the two sites located upstream of
the Rio Negro (BAR and NEG) are differentiated, but still show some
admixture with sites downstream (CEM and ANA). Nevertheless, BRA does
not share posterior membership probability with BAR and NEG even though
these sites are close geographically. When using [K = 4], sites
located far upstream of the Rio Solimões (SOL and TEF) are
differentiated from the rest of the Rio Solimões sites. But PIR, a site
even further upstream the Rio Solimões, is not differentiated from other
downstream lakes. Each subsequent increase of the K-value up to eight
led to the differentiation of a new site, [K = 5] differentiates
MAN, [K = 6] differentiates BRA, [K = 7] differentiates PIR and
[K = 8] differentiates JAR.
According to the results from the cross-validation error values from
ADMIXTURE [Fig. S2], the optimal number of clusters for our genetic
data is three. Nevertheless, this cross-validation value 0.20966 is
close to the one obtained with four clusters 0.20997. The
“find.cluster” function from Adegenet led to a similar result
since its goodness of the fit (BIC) values reduced more slowly at the
fourth cluster [Fig. S3]. Additionally, the posterior membership
probability plots [Fig. S1] stopped forming biologically significant
clusters after [K = 4], differentiating only one sampling site at
the time when adding more clusters. For this reason, we completed the
phylogeographic analysis assuming that our full SNPs dataset is
represented by four genetic clusters.
According to the pie chart map [Fig. 2], some sites that are close
to each other are showing a strong genetic differentiation with each
others. For instance, there is a genetic gap , a disproportional
genetic distance compared to the river course distance separating sites,
at the confluence of the Rio Negro and Rio Solimões. Additionally, there
is another similar genetic gap between BRA and NEG. Thesegenetic gaps are detectable in every neutral population structure
analysis that we have produced; the pie chart map [Fig. 2], the high
pairwise Fst values [Fig. 3] and the admixture barplots [Fig
S1]. These two genetic gaps represent sites that are separated
by a short river course distance, but that are isolated by a strong
downstream water flow and are from drastically different environmental
conditions, as shown by the differences in physicochemical properties
recorded at these sites [Table 1]. However, BRA (white water) has a
higher relatedness with ANA (black water) than NEG (black water) has
[Fig 2 and 3]. This is despite BRA and NEG being at similar river
course distances from ANA. This migration pattern, migrating downstream
preferentially from white water to black water, is inverse to the one
observed at the Rio Negro-Solimões confluence, where the white water
sites are more closely related to other white water sites. Similarly,
ANA and CEM, sites from the Rio Negro, share common posterior
memberships with sites downstream (JAR and CAT) [Fig. 2] even though
they are from drastically divergent environments.
We detected a third genetic gap at the confluence of the Rio Tefé
and Solimões. Effectively, the genetic distance between TEF and PIR is
disproportionately big when compared to the small geographic distance
separating the sites. Additionally, SOL (white water) is more closely
related to TEF (black water) than PIR (white water). This result is
detectable in the pie chart map [Fig. 2], where PIR shares a common
posterior membership with other downstream Rio Solimões sites, and TEF
and SOL are clustered apart. This result is also present in the pairwise
Fst heatmap [Fig. 3], where SOL and TEF are almost identical.
Howerver, according to the pairwise Fst heatmap, TEF and SOL are not
much more genetically distant to other downstream Rio Solimões sites
than PIR [Fig. 2].
The multiple regression on distance matrices (MRM) analysis detected a
significant association between the pairwise Fst matrix and both the
river course distance (p-value = 0.021) and the connectivity (p-value =
0.001) matrices. The relation between the genetic distance and the water
type similarity matrix was not significant (p-value = 0.571). When using
both the river course distance and the connectivity matrices, 59.23 %
of the dependant matrix is explained by the linear model produced.
According to the one-by-one Mantel tests [Fig. S4], the pairwise
genetic distances between sites are moderately correlated with the
pairwise river course distances (correlation coefficient of 0.54 with a
p-value of 0.004). In the same way, there is a strong correlation
between the pairwise genetic distances and downstream water flow
connectivity (correlation coefficient of 0.71 with a p-value of 0.001)
and a non-significant correlation between genetic distances and the
water type similarity matrix (correlation coefficient of 0.25 with a
p-value > 0.05).
Environmental Association
Study
As seen in the physicochemical parameters biplot using the five selected
environmental parameters [Fig. 4], differences in water
physicochemical characteristics can differentiate the two water types.
Black water sites were characterized by higher DOC and Al concentrations
and lower pH, while white water sites had higher amounts of silicate in
suspension, as well as higher conductivity and Chl a concentration
[Table 1 and Fig. 4].
All six axes of the RDA were significant (p-value < 0.05) and
used for the detection of associations between the genotypes and
environmental predictors. The corrected sum of the variance explained by
the environmental predictors in the redundancy analysis is 4.93 %.
Sample representation in the RDA according to the explanatory variables
was unrelated to their respective genetic clusters [Fig. S6]. A
total of 584 unique SNPs were associated to the environmental predictors
in the RDA. From these, 45 were associated to aluminum concentration, 29
to productivity, 74 to conductivity, 44 to DOC concentration, 357 to
silicate concentration and 35 directly to water types. For the LFMM2, a
total of 367 unique SNPs had a significant p-value after the Bonferroni
correction. From these, 13 were associated to aluminum concentration,
215 to productivity, 107 to conductivity, 4 to DOC concentration, 117 to
silicate concentration and 24 directly to water type. For Baypass2, the
neutral genetic structure estimated by the program [Fig. S8] is
concordant with the Fst heatmap previously produced [Fig. 3]. A
total of 307 unique SNPs had an eBPis superior to 1.5 and were
considered as putatively under selection. From these, 178 were
associated to aluminum concentration, 63 to productivity, 60 to
conductivity, 5 to DOC concentration, 21 to silicate concentration and
15 directly to water type. From these SNPs, 172 were found in at least 2
methods and kept for the following analyses [Fig. 5].
Yet, the 172 selected SNPs resulting from our EAS are not structuring
the samples according to their water type. According to the PCA using
the water type associated SNPs [Fig. 6], samples are clustering
according to their watershed of origin [Fig. 6B] and not according
to their water type [Fig. 6C]. Samples from the two main Amazonian
watersheds are well differentiated by PC1, which retains 26.56 % of the
variation in the genetic matrix. Additionally, BRA (white water) is
clustering with black water sites from the Rio Negro (i.e., ANA, CEM,
NEG and BAR). In contrast, TEF and SOL (respectively black and white
water sites) seemed to be isolated from the other Solimões River sites,
which is concordant with our previous results [Fig. 2 and 3]. When
compared to a PCA using the full 41,268 SNPs [Fig. S10], the general
clusters stay the same. The only major difference is in the clustering
of SOL and TEF with the other sites from the Solimões watershed and the
higher dispersion of the sites from Rio Negro along PC2. Again, the
differences in water type between sites do not seem to be the main
structuring factors in the data.