Rare SVs contribute to variation among subsamples
Our results show that low abundant, rare SVs contributed to the differences seen between sampling strategies. Even AMF communities, which were already similar, increased in overlap by 50% between strategies after low abundant SVs (represented by <0.05% of sequences, Table 1) were removed. Microbial community distributions are often characterized by long tails of low-abundant species (Unterseheret al ., 2011) and as such, the likelihood of resampling rare species in each replicate can be low. In one study, Zhou et al . (2011) randomly sampled a simulated community with an exponential distribution. They observed only a 53% overlap between two samples when sampling just 1% of that community. We see even more extreme differences in overlap in this study, where initial sampling effort is also low relative to the whole microbial community.
The importance of rare microbes may vary and is easily overlooked in favor of highly abundant, and perhaps more influential fungi or bacteria. However, due to the compositional nature inherent to amplicon data, those SVs that appear to be in low abundance at the time of sampling may only be relatively so. Also, we do not yet fully understand microbial species turnover or succession. Plant-associated microbial communities can change significantly in just a matter of months (McTeeet al., 2019), or even weeks (Gao et al ., 2019). In addition, the exact relationship between sequence number and biomass of a species is variable (Kleiner et al ., 2017), and there is little evidence, if any, that sequence number is in direct proportion with a species’ impact in an ecosystem. Some microbes may be more metabolically active than others, despite being present in smaller quantities (Joussetet al, 2017). The recovery of the rare microbial community is arguably just as vital as the recovery of species that appear more abundant.
Bioinformatics pipelines that artificially inflate the number of SVs, especially low abundant or rare SVs, could potentially inflate the differences we see among community subsamples. Hundreds of bioinformatics approaches have been used to analyze amplicon data, and no consensus exists on which is best. However, a recent study comparing the performance of 360 different software and parameter combinations showed that DADA2 (which is what we used here), with no other filter other than the removal of low quality and chimeric sequences, was best for recovering true richness and composition from a mock fungal community of 189 different strains (Pauvert et al ., 2019). If anything, DADA2 can erroneously lump closely related species (Callahanet al ., 2016), which would make it more conservative than other methods used. However, in an effort not to overestimate the true variation between strategies compared in this study, we assessed the relative importance of rare taxa through the gradual removal of lesser-abundant sequences, and we also used LULU, which is sometimes employed to reduce artifactual diversity (Frøslev et al ., 2017). We also removed all SVs that could not be confidently assigned to known microbial taxa. Even with these approaches, substantial variation remained due to the inherent undersampling of the strategies compared.