MiniProject

Abstract

Yersinia are a diverse genus of gram-negative enterobacteriaceae, three species of which are well defined human pathogens Y.pestis, Y.pseudotuberculosis and Y.enterocolitica. Identifying and gauging the pathogenicity of a given Yersinia species is an important for public health monitoring. Here we have used complementory bioinformatic tools to determine the species of an unknown Yersinia sample using whole genome sequencing, followed by genome annotation to search for known pathogentic genome features in Yersinia. Phylogentic analysis confirmed that the query sequence belonged to the Y.enterocolitica species. The absence of sequence for the pYV plasmid and the ’high-pathogenicity island’, are indicative of a non-pathogenic 1A strain. We also identified genes in the query strain which were not present in other pathogenic Y.enterocolitica 1B biovars. This may be indicative of other functional differences that influence pathogenicity.

Introduction

Yersinia is a genus of gram-negative enterobacteriaceae. Of the characterised species of Yersinia, three have been particularly well studied due to their pathogenicity to humans; Y.pestis, Y.pseudotuberculosis and Y.enterocolitica. Y.pestis is highly pathogeneic, resulting in a systemic disease (’plague’) which affects multiple organ systems; lungs, lymph nodes and blood vessels. Conversly, Y.pseudotuberculosis and Y.enterocolitica are enteropathogens, primarily affecting the gastrointestinal system where they can cause local inflammation, diarrhea and fever. Furthermore, while Y.pestis is transmitted through flea bites, Y.enterocolitica and Y.pseudotuberculosis infections are primarily the result of consuming contaminated food or water. Other Yersinia species are not thought to be pathogenic to humans.

Y.entrocolitica strains are particularly diverse, containing a spectrum of non-pathogenic (1A), mildly-pathogenic (2-5) and pathogenic (1B) biovars., which can be further differentiated based on serotype. Interestinly, while biovar 1A is primarily found in North America, non-pathogenic Y.enterocolitica are more common in Japan and Europe (Schubert 2004). Isolating the genomic features that determine the virulance of Yersinia is of major interest. Perhaps the most well established is the  70kb pYV plasmid which is common to all pathogenic Yersinia, including pathogenic members of Y.enterocolitica. Similarly the yersiniabactin gene cluster, located in the ’high-pathogenicity island’, is not evident in non-pathogenic Yersinia (Schubert 2004).

Given the diversity within the Yersinia genus, the ability to quickly identify a Yersinia species from a sample is important for public health. This has been aided by the development of; i) high-throughput sequencing of whole bacterial genomes and, ii) curated databases of genomic features that confer pathogenicity to humans, such as YersinaBase (http://yersinia.um.edu.my/index.php/home/main). In this study we use a range of bioinformatic tools, on whole-genome sequence from an unknown Yersinia sample, in an attempt to correctly identify the species and to gauge the pathogenicity of the Yersinia on human health.

Materials and Methods

5,260,610 76bp Illumina MiSeq reads (paired-end) from an unknown textitYersinia genome were provided to us in FASTQ format. Read quality was assessed using a combination of FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and the Fastx-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). All quality metrics indicated the data was of high quality (Figure 1). Residual Illumina adapter sequence was detected and removed using the fastx-clipper.

Interleved forward and reverse reads were assembled using Velvet (v1.2.09) (Zerbino 2008). The VelvetOptimiser script was used to select the optimal kmer length (optimal kmer=53) and to determine coverage th