Transrate: Quality assessment of de-novo transcriptome assemblies

Abstract

Abstract

Improvements in short-read sequencing technology combined with rapidly decreasing prices have enabled the use of RNA-seq to assay the transcriptome of species whose genome has not been sequenced. De-novo transcriptome assembly attempts to reconstruct the original transcript sequences from short reads. Such transcriptome assemblies are relied upon for gene expression studies, phylogenetic analyses, and molecular tooling. It is therefore important to ensure that assemblies are as accurate as possible, but to date there are no tools for deep quality assessment of assemblies. We present Transrate, an open source command-line program and library implemented in the Ruby and C languages that automates deep analysis of transcriptome assembly quality. Transrate evaluates assemblies based on inspecting contigs, read mapping, and comparison to reference species with an extensive suite of established and novel metrics. In addition we introduce the transrate score: a novel summary statistic based on an explicit, intuitive statistical model of the transcriptome, that captures many aspects of assembly quality. We demonstrate using both published and simulated data that using Transrate identifies the strengths and weaknesses of different assembly strategies and enables informed optimisation of assembly pipelines.