Rare SVs contribute to variation among subsamples
Our results show that low abundant, rare SVs contributed to the
differences seen between sampling strategies. Even AMF communities,
which were already similar, increased in overlap by 50% between
strategies after low abundant SVs (represented by <0.05% of
sequences, Table 1) were removed. Microbial community distributions are
often characterized by long tails of low-abundant species (Unterseheret al ., 2011) and as such, the likelihood of resampling rare
species in each replicate can be low. In one study, Zhou et al .
(2011) randomly sampled a simulated community with an exponential
distribution. They observed only a 53% overlap between two samples when
sampling just 1% of that community. We see even more extreme
differences in overlap in this study, where initial sampling effort is
also low relative to the whole microbial community.
The importance of rare microbes may vary and is easily overlooked in
favor of highly abundant, and perhaps more influential fungi or
bacteria. However, due to the compositional nature inherent to amplicon
data, those SVs that appear to be in low abundance at the time of
sampling may only be relatively so. Also, we do not yet fully understand
microbial species turnover or succession. Plant-associated microbial
communities can change significantly in just a matter of months (McTeeet al., 2019), or even weeks (Gao et al ., 2019). In
addition, the exact relationship between sequence number and biomass of
a species is variable (Kleiner et al ., 2017), and there is little
evidence, if any, that sequence number is in direct proportion with a
species’ impact in an ecosystem. Some microbes may be more metabolically
active than others, despite being present in smaller quantities (Joussetet al, 2017). The recovery of the rare microbial community is
arguably just as vital as the recovery of species that appear more
abundant.
Bioinformatics pipelines that artificially inflate the number of SVs,
especially low abundant or rare SVs, could potentially inflate the
differences we see among community subsamples. Hundreds of
bioinformatics approaches have been used to analyze amplicon data, and
no consensus exists on which is best. However, a recent study comparing
the performance of 360 different software and parameter combinations
showed that DADA2 (which is what we used here), with no other filter
other than the removal of low quality and chimeric sequences, was best
for recovering true richness and composition from a mock fungal
community of 189 different strains (Pauvert et al ., 2019). If
anything, DADA2 can erroneously lump closely related species (Callahanet al ., 2016), which would make it more conservative than other
methods used. However, in an effort not to overestimate the true
variation between strategies compared in this study, we assessed the
relative importance of rare taxa through the gradual removal of
lesser-abundant sequences, and we also used LULU, which is sometimes
employed to reduce artifactual diversity (Frøslev et al ., 2017).
We also removed all SVs that could not be confidently assigned to known
microbial taxa. Even with these approaches, substantial variation
remained due to the inherent undersampling of the strategies compared.