Linkage disequilibrium network complexity reduction improves power to
detect parallel evolution
Abstract
Compelling evidence for natural selection comes from studying similar
adaptations that have evolved in multiple independently colonized
populations from shared ancestral variation (parallel evolution).
However, finding genomic regions associated with fitness remains
challenging due to inadequate control of false positive rates and overly
conservative corrections for multiple testing. Using simulations
parameterized on empirical data derived from the nine- and three-spined
sticklebacks, four approaches to detect genomic regions associated with
parallel evolution are compared in high-density genome-wide SNP data;
linear mixed model (EMMAX), latent factor mixed model (LFMM), redundancy
analysis (RDA) and BayPass. RDA and BayPass were the most conservative
followed by EMMAX, and while LFMM was the most powerful, it was also the
most prone to false positives, particularly at high levels of background
genetic differentiation and low levels of parallelism. Because some
methods were sensitive to similar biases, false positives were often
shared between them. Using linkage disequilibrium (LD) network-based
complexity reduction in combination with EMMAX, the cost of multiple
corrections was greatly reduced increasing the power to detect
signatures of natural selection relative to single locus- or
window-based approaches, with well controlled false positive rates. This
approach can further improve our ability to distinguish false positives
caused by population demographic history (genome wide effects) from
those affected by non-neutral evolutionary processes that affect
LD-patterns locally. The outlined approach improves our ability to
identify genomic targets of natural selection and pave the path towards
better understanding adaptive evolution in the wild.