4 | DISCUSSION
Our novel MetHis forward-in-time simulator and summary-statistics calculator coupled with RF-ABC scenario-choice can distinguish among highly complex admixture histories using genetic data. As expected, scenario-choice errors are particularly made in regions of the parameter space for which models are highly nested (Robert, Mengersen, & Chen, 2010), and, thus, biologically similar. Furthermore, we found that NN-ABC provided accurate and reasonably conservative posterior parameter estimation for numerous parameters of the winning scenario, using human population data as a case-study. Finally, we empirically demonstrated that the moments of the distribution of admixture fractions in the admixed population were highly informative for ABC inference, as expected theoretically (Gravel, 2012; Verdu & Rosenberg, 2011).
Altogether, our results for the two recently-admixed human populations illustrate how our MetHis – ABC framework can bring fundamental new insights into the complex demographic history of admixed populations; a framework that can easily be adapted, using MetHis(Supplementary Note S1 ), for investigating complex admixture histories when maximum-likelihood methods are intractable.
We considered nine competing scenarios all deriving from the general mechanistic admixture model of Verdu and Rosenberg (2011). While the two-source version of this model can readily be simulated withMetHis , it considers 2g -1 model parameters (with g the duration of the admixture process), plus effective population sizes parameters and mutation parameters. Estimating jointly all these parameters is out of reach of ML methods, and further likely out of reach of ABC posterior-parameter estimation procedures. However, conducting ABC model-choice for disentangling major classes of relatively simplified admixture processes followed by ABC parameter estimation under the winning model, is flexible enough to bring new insights into the evolutionary history of admixed populations, far beyond all admixture scenarios that can be explored with existing ML methods (Gravel 2012; Hellenthal et al. 2014).
The sample and SNP-set explored here is often out of reach in non-model species. Nevertheless, our results considering vastly reduced SNP or sample sets demonstrate that ABC can remain remarkably accurate to disentangle highly complex admixture processes with much less genetic or sample data. This is due to the fact that ABC relies on the amount of information carried by summary-statistics about model parameters, rather than the absolute amount of genetic data investigated. Therefore, theMetHis -ABC framework remains promising to reconstruct complex admixture histories, provided that summary-statistics considered by the user are, a priori , informative about model parameters, and that summary-statistics are reasonably well estimated with the observed data. Altogether, large parameter and summary-statistics spaces, lack of information from summary statistics, and scenario nestedness, are well known to affect ABC performances and, thus, imperatively need to be thoroughly evaluated case by case (Csilléry, Blum, Gaggiotti, & Francois, 2010; Robert et al., 2010; Sisson et al., 2018).
To further increase the range of applicability of our MetHis -ABC framework, our software readily implements microsatellite markers together with a general stepwise mutation model (Estoup, Jane, & Cornuet, 2002), fully parameterizable by the user (Supplementary Note S1 ). This will allow investigating numerous complex admixture histories, much older than the one here explored, and from non-model species. Even if prior knowledge of the founding date is lacking,MetHis users can simply set the founding of the population in a remote past and implement a second founding event with variable date to be estimated, together with later additional admixture events and other parameters of interest, in the ABC inference. Nevertheless, it is not trivial to predict how old an admixture processes should be to be successfully investigated with ABC (Buzbas and Verdu 2018). Indeed, ancient admixture processes can leave scarcely identifiable signatures in the observed data, if obliterated by more recent admixture events. This was theoretically expected (Buzbas, & Verdu, 2018), and future studies combining ancient and modern DNA samples may bring further information into the ancient admixture history reconstruction.
Importantly, the computational cost of our study depends, for 2/3, on summary statistics calculation at the end of the admixture process, as is often the case in ABC. Considering much longer admixture processes than the ones here investigated will mechanically increase computation time but will not increase summary-statistics calculation time. Furthermore, note that the computational cost of simulating data with MetHis does not rely excessively on the number of generations considered (within reason), nor on the absolute number of markers used, but rather on the effective population size in the admixed population set by the user.
Although MetHis readily allows considering changes of effective population size in the admixed population at each generation as a parameter of interest to ABC inference (Supplementary Note S1 ), we did not, for simplicity, investigate here how such changes affected our results. Future work using MetHis will specifically investigate how effective size changes may influence genetic patterns in admixed populations, a question of major interest as numerous admixed populations have experienced founding events and/or bottlenecks during their genetic history (e.g. Browning et al., 2018).
The current MetHis – ABC approach does not make use of admixture linkage-disequilibrium patterns in the admixed population, and only relies on independent SNP or microsatellite markers. Nevertheless, admixture LD has consistently proved to bring massive information about complex admixture histories in populations where large genomic datasets are available (Gravel, 2012; Hellenthal et al., 2014; Malinsky et al., 2018; Medina et al., 2018; Ni et al., 2019; Stryjewski & Sorenson, 2017). However, existing methods to calculate admixture LD patterns remain computationally intensive and require both dense marker-sets and accurate phasing, which is difficult under ABC where such statistics have to be calculated for each one of the numerous simulated datasets. In this context, RF-ABC (Pudlo et al., 2016; Raynal et al., 2019), or AABC (Buzbas & Rosenberg, 2015), methods allow substantially diminishing the number of simulations required for satisfactory ABC inference. This makes both approaches promising tools for using, in the future, admixture-LD patterns to reconstruct complex admixture processes from genomic data.
Finally, future developments of the MetHis -ABC framework will focus on implementing sex-specific admixture models, as these processes are known to affect genetic diversity patterns in a specific way, and are of interest to numerous study-cases (Goldberg, Verdu, & Rosenberg, 2014). Furthermore, the MetHis forward-in-time simulator represents an ideal tool to further investigate admixture-related selection forces, and admixture-specific assortative matting processes, as these processes can simply be modeled by specifically parameterizing individual reproduction and survival in the simulations, unlike most coalescent-based simulators.