Transrate: Quality assessment of de-novo transcriptome assemblies
Improvements in short-read sequencing technology combined with rapidly decreasing prices have enabled the use of RNA-seq to assay the transcriptome of species whose genome has not been sequenced. De-novo transcriptome assembly attempts to reconstruct the original transcript sequences from short reads. Such transcriptome assemblies are relied upon for gene expression studies, phylogenetic analyses, and molecular tooling. It is therefore important to ensure that assemblies are as accurate as possible, but to date there are no tools for deep quality assessment of assemblies. We present Transrate, an open source command-line program and library implemented in the Ruby and C languages that automates deep analysis of transcriptome assembly quality. Transrate evaluates assemblies based on inspecting contigs, read mapping, and comparison to reference species with an extensive suite of established and novel metrics. In addition we introduce the transrate score: a novel summary statistic based on an explicit, intuitive statistical model of the transcriptome, that captures many aspects of assembly quality. We demonstrate using both published and simulated data that using Transrate identifies the strengths and weaknesses of different assembly strategies and enables informed optimisation of assembly pipelines.
The use of RNA-seq for de-novo transcriptome assembly is a complex procedure, but if done well can yield valuable, high throughput biological insights at relatively low cost (e.g. (Aubry 2014)). The analytical pipeline might include inspecting reads, trimming adapters amd low quality bases, read error correction, digital normalization, assembly and post-assembly improvements. Because the computational problems involved in these steps are hard to solve, there are many competing approaches, and because each organism has unique genomic properties, the algorithms need to be selected and tuned carefully for each experiment.