Materials and Methods

5,260,610 76bp Illumina MiSeq reads (paired-end) from an unknown textitYersinia genome were provided to us in FASTQ format. Read quality was assessed using a combination of FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and the Fastx-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). All quality metrics indicated the data was of high quality (Figure 1). Residual Illumina adapter sequence was detected and removed using the fastx-clipper.

Interleved forward and reverse reads were assembled using Velvet (v1.2.09) \cite{18349386}. The VelvetOptimiser script was used to select the optimal kmer length (optimal kmer=53) and to determine coverage threshold (optimal cov_cutoff=1.96) optimized for ’n50’ (http://bioinformatics.net.au/software.velvetoptimiser.shtml). Fourty contigs larger than 1kbp were assembled, with an average length of 118,677bp (n50 = 276703bp). The total length of all contigs was 4,747,089bp, the largest contig was 563,205bp. This is broadly consistent with known Yersinea genome sizes.

To perform a phylogenetic analysis of Yersinia species, 16S ribosomal subunit sequence was downloaded from GenBank. In total 34 RefSeq sequences from 17 different Yersinia species were analysed (Table 1). To supplement the analysis with Y.enterocolitica species from the full spectrum of biovars (1A, 1B and 2-5), contigs from fifteen Y.enterocolitica samples, reported in Reuter et al, were downloaded from the European Nucleotide Archieve \cite{24753568} (Table 2). Where not already available, 16S ribosomal subunit nucleotide sequence were extracted from assembled contigs using the RNAmmer server (v1.2) \cite{17452365}. 16S FASTA sequences were aligned using Clustal Omega \cite{21988835,20439314,23671338} and alignment files were used to construct phylogenetic trees in Seaview using a parsimony model with 100 bootstrap replicates \cite{19854763}.

Contigs were scaffolded to a reference Y.enterocolitica genome (Genbank: AM286415.1) using the Contiguator web application (http://combo.dbe.unifi.it/contiguator) \cite{21693004}. Scaffold and contig assemblies were annotated with gene features using two independent tools; PROKKA \cite{24642063} and the RAST server \cite{18261238,24293654}. Annotations in Genbank format were uploaded to Artemis for visulaisation \cite{11120685}.

PathogenFinder v1.1 and ResFinder v2.1 were used to identify and rank pathogenic genes and antibiotic resistance genes respectively \cite{24204795} \cite{22782487}. The identification of bacterial insertion sequences was performed using the ISFinder website (https://www-is.biotoul.fr/) \cite{16381877}.