2.5 Pre-processing of reads, de novo assembly and abundance
mapping
Raw sequence reads were processed to remove adapters and primers
sequences, PCR duplicates, ribosomal RNA (rRNA), host (Felis
catus ) reads and poor-quality terminal regions as previously described
(Brussel et al., 2020). Briefly, rRNA reads were removed using SortMeRNA
and host reads were identified and removed by mapping to the Felis
catus genome (Brussel et al., 2020). The filtered metatranscriptomic
reads (RNA) were de novo assembled using Trinity version 2.8.5
and the filtered metagenomic reads (cDNA and DNA) were de novoassembled using IDBA-UD version 1.1.2 (Brussel et al., 2020). The
contigs were compared to the non-redundant protein database using
Diamond version 2.0.4. The taxonomic classification for the filtered
reads was calculated using KMA version 1.3.9a (Clausen et al., 2018) and
CCMetagen version 1.2.4 (Marcelino et al., 2020) by comparing the
filtered paired-end reads to the NCBI nucleotide database that contains
all NCBI sequences except those of environmental eukaryotic and
prokaryotic, unclassified and artificial origin. In CCMetagen read
depth, specified as reads per million (RPM), was calculated and the
threshold function was disabled to allow all taxonomy levels to be
reported (Marcelino et al., 2020). Read abundance was further calculated
by mapping filtered reads to the de novo assembled contigs
observed in this dataset using Bowtie2 version 2.3.4.3. Geneious version
2020.2.5 was used to predict ORFs and annotate genomes. The extent of
index-hopping between libraries sequenced on the same lane was minimized
by comparing contigs and identifying any identical sequences. The
library with the highest read abundance for that sequence was then used
to exclude any library that had a read abundance below 0.01% of that
number.