3.1 | Complex admixture scenarios cross-validation with
RF-ABC
We trained the RF-ABC model-choice algorithm using 1,000 trees, which
guaranteed the convergence of the model-choice prior error rates
(Supplementary Figure S3 ). Based on this training, the complete
out-of-bag cross-validation matrix showed that the nine competing
scenarios of complex historical admixture could be relatively reasonably
distinguished despite the high level of nestedness of the scenarios here
considered (Figure 2 ). Indeed, we calculated an out-of-bag
prior error rate of 32.41%, considering each 90,000 simulation, in
turn, as out-of-bag pseudo-observed target dataset, compared to a prior
probability of 88.89% to erroneously select a scenario. Furthermore, we
found the posterior probabilities of identifying the correct scenario
ranging from 55.17% (prior probability = 11.11% for each competing
scenario), for the two-pulses scenarios from both the African and
European sources (Afr2P-Eur2P), to 77.71% for the scenarios considering
monotonically decreasing recurring admixture from both sources
(AfrDE-EurDE).
Importantly, the average probability, for a given admixture scenario, of
choosing any one alternative (wrong) scenario were on average 4.05%
across the eight alternative scenarios, ranging from 2.79% for the
AfrDE-EurDE scenario, to 5.60% for the Afr2P-Eur2P scenario
(Figure 2 ). This shows that our approach did not systematically
favor one or the other competing scenario when wrongly choosing a
scenario instead of the true one. Furthermore, note that Afr-DE-EurDE
scenarios were rarely confused (3.8%) with other recurring admixture
scenarios containing at least one recurring admixture increase
(AfrIN-EurDE, AfrDE-EurIN, AfrIN-EurIN), which shows a strong
discriminatory power of RF-ABC model-choice a priori , even among
complex recurring admixture scenarios.
In cross-validation analyses of groups of scenarios
(Estoup et al., 2018), monotonically
recurring admixture scenarios (AfrDE-EurDE, AfrDE-EurIN, AfrIN-EurDE,
AfrIN-EurIN) can be well distinguished from scenarios considering two
possible pulses after the founding event (Afr2P-Eur2P, Afr2P-EurDE,
Afr2P-EurIN, AfrDE-Eur2P, AfrIN-Eur2P). Indeed, we found an out-of-bag
prior error rate of 13.85%, and posterior cross-validation
probabilities of identifying the correct group of scenarios of 86.08%
and 86.23% respectively for the two groups.
Detailed investigation of cross-validation results shows that
inaccuracies of RF-ABC model-choices occur mainly in parameter-spaces
where scenarios are highly nested and, in fact, close biologically
(Figure 2 ). As expected, model-choice increasingly mistakes the
AfrDE-EurDE scenarios for scenarios containing two admixture pulses
(Afr2P-Eur2P, Afr2P-EurIN, AfrIN-Eur2P) as values ofu Afr and u Eur are closer
to 0, regardless of introgression rates values (Supplementary
Figure S5A ). Intuitively, the closer these parameter values are to 0,
the more peaked the decrease of recurring admixture are, which increases
model-choice confusion with pulse-like scenarios. Instead,u -values closer to 0.5 correspond to linearly decreasing
admixture over time which are hardly confounded with pulse-like
scenarios. Furthermore, the model-choice increasingly confuses, as
expected regardless of introgression values, Afr2P-Eur2P scenarios with
recurring increasing admixture scenarios
(AfrIN-EurIN, AfrDE-EurIN,
AfrIN-EurDE), as the time of the second admixture pulses from Europe or
Africa are recent (Supplementary Figure S5B ).
Most importantly, RF-ABC model-choice power to discriminate among
complex admixture processes a priori was not strongly affected by
the numbers of markers considered. Indeed, we found an out-of-bag prior
error of 33.53% and 37.93% (instead of 32.41%), considering
respectively 50,000 and 10,000 SNPs, instead of 100,000, together with a
very similar distribution of correct and mistaken predictions among
scenarios (Supplementary Figure S6A-B ). Finally, dividing by
five the sample sizes in population H and each source populations
increased, as expected, the cross-validation error rate (48.39%).
Nevertheless, all scenarios continue to be correctly identified three to
six times more often than expected a priori , and the distribution
of erroneous predictions remained similar to previously
(Supplementary Figure S6C ). Altogether, these results showed
that RF-ABC model-choice can be successfully used to distinguish highly
complex admixture models even when substantially less genetic and sample
data are considered.