Data processing for microbiome analysis
Fastq files were imported and processed by Qiime2 (v 2021-11)[ https://qiime2.org/] following the public tutorial for Casava 1.8 paired end demultiplexed sample format. Sequences were denoised using DADA2 method via qiime data2 denoise-paired to obtain amplicon sequencing variant (ASV) table. Taxonomy was assigned to ASVs by qiime feature-classifier (classify-sklearn) against the pre-trained Silva v138 database or V4 region (515F/806R) using RESCRIPt. Untargeted sequences were removed. The potential for contamination was addressed by co-sequencing DNA amplified from specimens and from two each of template-free controls and extraction kit reagents processed the same way as the specimens. ASVs were considered putative contaminants (and were removed) if their mean abundance in controls reached or exceeded 25 % of their mean abundance in specimens. Finally, samples with less than 1000 high quality read counts were excluded from downstream analysis. An average of 8197 quality-filtered reads were generated per wild sample. An average of 23408 quality-filtered reads were generated per laboratory sample. The total number of ASVs from the wild and laboratory datasets were 4555 (including those occurring once with a count of 1, or singletons).