Introduction

Second-generation sequencing of RNA (RNA-seq) has proven to be a sensitive and increasingly inexpensive approach for a number of different experiments, including annotating genes in genomes, quantifying gene expression levels in a broad range of sample types, and determining differential expression between samples. As technology improves, transcriptome profiling has been able to be applied to smaller and smaller samples, allowing for more powerful assays to determine transcriptional output. For instance, our lab has used RNA-seq on single Drosophila embryos to measure zygotic gene activation \cite{Lott:2011cc} and medium-resolution spatial patterning \cite{Combs:2013jy}. Further improvements will allow an even broader array of potential experiments on samples that were previously too small.

For instance, over the past few years, a number of groups have published descriptions of protocols to perform RNA-seq on single cells (typically mammalian cells) \cite{Tang:2009kj,Ramskold:2012gj,Sasagawa:2013jx,Hashimshony:2012ca,Islam:2011jr}. A number of studies, both from the original authors of the single-cell RNA-seq protocols and from others, have assessed various aspects of these protocols, both individually and competitively \cite{Bhargava:2014gf,Wu:2014eg,Marinov:2013fm}. One particularly powerful use of these approaches is to sequence individual cells in bulk tissues, revealing different states and cellular identies \cite{Buganim:2012hp,Treutlein:2014ec}.

However, we felt that published descriptions of single-cell and other low-volume protocols did not adequately address whether a change in concentration of a given RNA between two samples would result in a proportional change in the FPKM (or any other measure of transcriptional activity) between those samples. While there are biases inherent to any protocol, we were concerned that direct amplification of the mRNA would select for PCR compatible genes in difficult to predict, and potentially non-linear ways. For many of the published applications of single cell RNA-seq, this is not likely a critical flaw, since the clustering approaches used are moderately robust to quantitative changes. However, to measure spatial and temporal activation of genes across an embryo, it is important that the output is monotonic with respect to concentration, and ideally linear.

While it is possible to estimate absolute numbers of cellular RNAs from an RNAseq experiment, doing so requires spike-ins of known concentration and estimates of total cellular RNA content \cite{Mortazavi:2008jj,Lin_2012}. However, many RNA-seq experiments do not do these controls, nor are such controls strictly necessary under reasonable, though often untested, assumptions of approximately constant RNA content. While ultimately absolute concentrations will be necessary to fully predict properties such as noise tolerance of the regulatory circuits \cite{Gregor:2007du,Gregor:2005jn}, many current modeling efforts rely only on scaled concentration measurements, often derived from in situ-hybridization experiments \cite{Garcia:2013fs,Ilsley:2013fk,He:2010ix}. Given that, we felt it was not important that different protocols should necessarily agree on any particular expression value for a given gene, nor are we fully convinced that absolute expression of any particular gene can truly reliably be predicted in a particular experiment.

In order to convince ourselves that data generated from limiting samples would be suitable for our purposes, we evaluated several protocols for performing RNA-seq on extremely small samples. We also investigated a simple modification to one of the protocols that reduced sample preparation cost per library by more than 2-fold. Finally, we evaluated the effect of read depth on quality of the data. This study provides a single, consistent comparison of these diverse approaches, and shows that in fact all data from the low-volume protocols we examined are usable in similar contexts to the earlier bulk approach.