Introduction

Yersinia is a genus of gram-negative enterobacteriaceae. Of the characterised species of Yersinia, three have been particularly well studied due to their pathogenicity to humans; Y.pestis, Y.pseudotuberculosis and Y.enterocolitica. Y.pestis is highly pathogeneic, resulting in a systemic disease (’plague’) which affects multiple organ systems; lungs, lymph nodes and blood vessels. Conversly, Y.pseudotuberculosis and Y.enterocolitica are enteropathogens, primarily affecting the gastrointestinal system where they can cause local inflammation, diarrhea and fever. Furthermore, while Y.pestis is transmitted through flea bites, Y.enterocolitica and Y.pseudotuberculosis infections are primarily the result of consuming contaminated food or water. Other Yersinia species are not thought to be pathogenic to humans.

Y.entrocolitica strains are particularly diverse, containing a spectrum of non-pathogenic (1A), mildly-pathogenic (2-5) and pathogenic (1B) biovars., which can be further differentiated based on serotype. Interestinly, while biovar 1A is primarily found in North America, non-pathogenic Y.enterocolitica are more common in Japan and Europe \cite{15493818}. Isolating the genomic features that determine the virulance of Yersinia is of major interest. Perhaps the most well established is the  70kb pYV plasmid which is common to all pathogenic Yersinia, including pathogenic members of Y.enterocolitica. Similarly the yersiniabactin gene cluster, located in the ’high-pathogenicity island’, is not evident in non-pathogenic Yersinia \cite{15493818}.

Given the diversity within the Yersinia genus, the ability to quickly identify a Yersinia species from a sample is important for public health. This has been aided by the development of; i) high-throughput sequencing of whole bacterial genomes and, ii) curated databases of genomic features that confer pathogenicity to humans, such as YersinaBase (http://yersinia.um.edu.my/index.php/home/main). In this study we use a range of bioinformatic tools, on whole-genome sequence from an unknown Yersinia sample, in an attempt to correctly identify the species and to gauge the pathogenicity of the Yersinia on human health.