Data analysis
Raw paired-end sequences were preprocessed using the HiSeq Control Software (diet conversion) and the MiSeq Control Software (lifestyle shift) programs. After filtering low-quality reads, clean amplicon reads were imported into the QIIME software package and analyzed as previously described (Caporaso et al 2010). Briefly, the 16S rRNA gene and ITS sequences were clustered at the 97% nucleotide sequence similarity level to generate representative operational taxonomic unit (OTU) sequences using the SILVA (Quast et al 2013) and UNITE (Koljalg et al 2013) reference databases for the bacterial and fungal libraries, respectively. Chao1 and Shannon index richness/diversity metrics were calculated in QIIME (V1.9.1) and visualized in R (V3.5.0). Principal coordinates analysis (PCoA) was conducted on the OTU compositional matrices using the Bray-Curtis (BC) distance, as implemented in QIIME with the default settings. Linear discriminant analysis coupled with effect sizes (LEfSe) was conducted in the galaxy platform (http://huttenhower.sph.harvard.edu/galaxy/root). A cladogram with circular representations of taxonomic compositions and phylogenetic trees were produced using GraPhlAn (Truong et al 2015).
A whole genome shotgun (WGS) library composed of around 400 bp clone inserts was generated for associated samples. Metagenomic sequencing of the library was performed on the Illumina HiSeq4000 platform (Illumina, Inc., San Diego, CA, USA) using 2×150 bp paired-end sequencing mode at Majorbio Bio-Pharm Technology Co., Ltd. (Shanghai, China). The Seqprep (https://github.com/jstjohn/SeqPrep ) and Sickle (https://github.com/najoshi/sickle ) were then used to filter low quality reads with a length less than 50 bp and those reads with an average quality score < 20. The bwa aligner (http://bio-bwa.sourceforge.net/) was also used to remove reads that matched to the host (Ailuropoda melanoleuca ) genome sequence as well as to genome sequences from the common plants Malus domestica , Daucus carota , Zea mays , Oryza Sativa and Glycine max(https://www.ncbi.nlm.nih.gov/genome/). The filtered and microbiota-enriched reads for each sample were then subjected to contig assembly using IDBA-UD (http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/) (Peng et al 2012 ). The Metabat2 genome binning program was used to bin the contigs of the sample assemblies (Stewart et al 2018) into metagenome assembled genomes (MAGs). A total of 449 draft MAGs were recovered and dRep was used to de-duplicate them (Olm et al 2017). Dereplication resulted in a total of 22 high quality bins, as assessed by CheckM (Parks et al 2017) and completeness values ≥ 70% and contamination ≤ 10%. The high-quality bins were retained for further analyses. Prodigal (http://compbio.ornl.gov/prodigal/) was used to predict genes within the high quality bins (Hyatt et al 2010). Functional annotation of the predicted genes was conducted via BLASTx analysis against the Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/) and Carbohydrate-active enzymes (CAZy, http://www.cazy.org/) databases. Housekeeping phylogenetic marker genes were also identified in the 22 reconstructed genomes using amphora2 (https://github.com/martinwu/AMPHORA2).
One-way analysis of variance (ANOVA) analyses were used to identify significant differences in the alpha diversity among communities, as based on amplicon sequencing. All tests for significance were two-sided and used a p value < 0.05 to determine statistical significance. Analysis of molecular variance (AMOVA) tests were used to identify significant differences in GM structure among different treatment groups based on BC distances, with a p value < 0.05 used to identify statistical significance. Non-parametric factorial Kruskal-Wallis sum-rank tests were also used to detect significant differences in the phylum- or genus-level taxonomic compositions between groups in the LEfSe analyses. Further, LDA was used to estimate the effect sizes of each feature using a normalized relative abundance matrix. An LDA value > 4.0 was considered statistically significant.