ROUGH DRAFT authorea.com/6804

# Validation of methods for Low-volume RNA-seq

Abstract

Recently, a number of protocols extending RNA-sequencing to the single-cell regime have been published. However, we were concerned that the additional steps to deal with such minute quantities of input sample would introduce serious biases that would make analysis of the data using existing approaches invalid. In this study, we performed a critical evaluation of several of these low-volume RNA-seq protocols, and found that they performed slightly less well in metrics of interest to us than a more standard protocol, but with at least two orders of magnitude less sample required. We also explored a simple modification to one of these protocols that, for many samples, reduced the cost of library preparation to approximately \$20/sample.

# Introduction

Second-generation sequencing of RNA (RNA-seq) has proven to be a sensitive and increasingly inexpensive approach for a number of different experiments, including annotating genes in genomes, quantifying gene expression levels in a broad range of sample types, and determining differential expression between samples. As technology improves, transcriptome profiling has been able to be applied to smaller and smaller samples, allowing for more powerful assays to determine transcriptional output. For instance, our lab has used RNA-seq on single Drosophila embryos to measure zygotic gene activation (Lott 2011) and medium-resolution spatial patterning (Combs 2013). Further improvements will allow an even broader array of potential experiments on samples that were previously too small.

For instance, over the past few years, a number of groups have published descriptions of protocols to perform RNA-seq on single cells (typically mammalian cells) (Tang 2009, Ramsköld 2012, Sasagawa 2013, Hashimshony 2012, Islam 2011). A number of studies, both from the original authors of the single-cell RNA-seq protocols and from others, have assessed various aspects of these protocols, both individually and competitively (Bhargava 2014, Wu 2014, Marinov 2013). One particularly powerful use of these approaches is to sequence individual cells in bulk tissues, revealing different states and cellular identies (Buganim 2012, Treutlein 2014).

However, we felt that published descriptions of single-cell and other low-volume protocols did not adequately address whether a change in concentration of a given RNA between two samples would result in a proportional change in the FPKM (or any other measure of transcriptional activity) between those samples. While there are biases inherent to any protocol, we were concerned that direct amplification of the mRNA would select for PCR compatible genes in difficult to predict, and potentially non-linear ways. For many of the published applications of single cell RNA-seq, this is not likely a critical flaw, since the clustering approaches used are moderately robust to quantitative changes. However, to measure spatial and temporal activation of genes across an embryo, it is important that the output is monotonic with respect to concentration, and ideally linear.

While it is possible to estimate absolute numbers of cellular RNAs from an RNAseq experiment, doing so requires spike-ins of known concentration and estimates of total cellular RNA content (Mortazavi 2008, Lin 2012). However, many RNA-seq experiments do not do these controls, nor are such controls strictly necessary under reasonable, though often untested, assumptions of approximately constant RNA content. While ultimately absolute concentrations will be necessary to fully predict properties such as noise tolerance of the regulatory circuits (Gregor 2007, Gregor 2005), many current modeling efforts rely only on scaled concentration measurements, often derived from in situ-hybridization experiments (Garcia 2013, Ilsley 2013, He 2010). Given that, we felt it was not important that different protocols should necessarily agree on any particular expression value for a given gene, nor are we fully convinced that absolute expression of any particular gene can truly reliably be predicted in a particular experiment.

In order to convince ourselves that data generated from limiting samples would be suitable for our purposes, we evaluated several protocols for performing RNA-seq on extremely small samples. We also investigated a simple modification to one of the protocols that reduced sample preparation cost per library by more than 2-fold. Finally, we evaluated the effect of read depth on quality of the data. This study provides a single, consistent comparison of these diverse approaches, and shows that in fact all data from the low-volume protocols we examined are usable in similar contexts to the earlier bulk approach.

# Results

## Experiment 1: Evaluation of Illumina TruSeq

In our hands, the Illumina TruSeq protocol has performed extremely reliably with samples on the scale of  100ng of total RNA, the manufacturer recommended lower limit of the protocol. However, attempts to create libraries from much smaller samples yielded low complexity libraries, corresponding to as much as 30-fold PCR duplication of fragments. Anecdotally, less than 5% of libraries made with at least 90ng of total RNA yielded abnormally low concentrations, which we observed correlated with low complexity (Data not shown). To determine the lower limit of input needed to reliably produce libraries, we attempted to make libraries from 40, 50, 60, 70, and 80 ng of Drosophila total RNA, each in triplicate.

Total TruSeq cDNA library yields made with a given amount of input total RNA. Yields measured by Nanodrop of cDNA libraries resuspended in 25$$\mu L$$ of EB. The italicized samples were unusually low, and when analyzed with a Bioanalyzer, showed abnormal size distribution of cDNA fragments.
Amount Input RNA Replicate A Replicate B Replicate C
40 ng 57 ng 425 ng 672 ng
50 ng 435 ng 768 ng 755 ng
60 ng 115 ng 663 ng 668 ng
70 ng 300 ng 593 ng 653 ng
80 ng 468 ng 550 ng 840 ng

\label{table:truseqtitration}

We considered the two libraries with lower than usual concentration to be failures. While a failure rate of approximately 1 in 3 might be acceptable for some purposes, we ultimately wanted to perform RNA sequencing on precious samples, where a failure in any one of a dozen or more libraries would necessitate regenerating all of the libraries. Furthermore, due to the low sample volumes involved (less than approximately 500pg of poly-adenylated mRNA), common laboratory equipment is not able to determine the particular point in the protocol where the failures occurred.

Thus, we consider 70 ng of total RNA to be the conservative lower limit to the protocol. While this is about 30% smaller than the manufacturer suggests, it is still several orders of magnitude larger than we needed it to be. We therefore considered using other small-volume and “single-cell” RNA-seq kits, which we had less experience with and less faith in the data.

## Experiment 2: Competitive Comparison of Low-volume RNAseq protocols

We first sought to determine whether the low-volume RNAseq protocols available faithfully recapitulate linear changes in abundance of known inputs. We generated synthetic spike-ins by combining D. melanogaster and D. virilis total RNA in known, predefined proportions of 0, 5, 10, and 20% D. virilis RNA. For each of the low-volume protocols, we used 1ng of total RNA as input, whereas for the TruSeq protocol we used 100ng.

Although pre-defined mixes of spike-in controls have been developed and are commercially available (Jiang 2011), we felt it was important to ensure that a given protocol would function reproducibly with natural RNA, which almost certainly has a different distribution of 6-mers, which could conceivably affect random cDNA priming and other amplification effects. Furthermore, our spike-in sample more densely covers the approximately $$10^5$$ fold coverage typical of RNA abundances. It should be noted, however, that our sample is not directly comparable to any other standards, nor is the material of known strandedness. We assumed that the majority of each sample is from the standard annotated transcripts, but did not verify this prior to library construction and sequencing.

The different protocols had a variation in yield of libraries from between 6 fmole (approximately 3.6 trillion molecules) and 2,400 femtomoles, with the TruSeq a clear outlier at the high end of the range, and the other protocols all below 200 fmole (Table \ref{tab:protocols}). All of these quantities are sufficient to generate hundreds of millions of reads—far more than is typically required for an RNA-seq experiment. We pooled the samples, attempting equimolar fractions in the final pool; however, due to a pooling error, we generated significantly more reads than intended for the TruSeq protocol, and correspondingly fewer in the other protocols. Unless otherwise noted, we therefore sub-sampled the mapped reads to the lowest number of mapped reads in any sample in order to provide a fair comparison between protocols.

We were interested in the fold-change of each D. virilis gene across the four samples, rather than the absolute abundance of any particular gene. Therefore, after mapping and gene quantification, we normalized the abundance $$A_{ij}$$ of every gene $$i$$ across the $$j=4$$ samples by a weighted average of the quantity $$Q_j$$ of D. virilis in sample $$j$$, as show in equation \ref{eqn:norm}. Thus, within a given gene, a linear fit of $$\hat{A}_{ij}$$ vs $$Q_j$$ should have a slope of one and an intercept of zero.

$\label{eqn:norm} \hat{A}_{ij} = A_{ij} \div \frac{\sum_j Q_j A_{ij}}{\sum_j (Q_j)^2}$

We filtered the D. virilis genes for those with at least 20 mapped fragments in the sample with 20% D. virilis, then calculated an independent linear regression for each of those genes. As expected, for every protocol, the mean slope was 1 ($$t$$-test, $$p<5\times10^{-7}$$ for all protocols). Similarly, the average intercepts for all protocols was 0 ($$t$$-test, $$p<5\times10^{-7}$$ for all protocols). Also unsurprisingly, the TruSeq protocol had a noticeably higher mean correlation coefficient ($$0.98 \pm 0.02$$) than any of the other protocols ($$0.95 \pm 0.06$$, $$0.92\pm0.09$$, and $$0.95 \pm 0.06$$ for Clontech, TotalScript, and SMART-seq2, respectively). The mean correlation coefficient was statistically and practically indistinguishable between the Clontech samples and the SMART-seq2 samples ($$t$$-test $$p = .11$$, Figure \ref{fig:fithists}).

Indeed, the only major differentiator we could find between the low-volume protocols we measured was cost. For only a handful of libraries, the kit-based all inclusive model of the Clontech and TotalScript kits could be a significant benefit, allowing the purchase of only as much of the reagents as required. By contrast, the Smart-seq2 protocol requires the a la carte purchase of a number of reagents, some of which are not available or more expensive per unit for smaller quantities. Furthermore, there could potentially be a “hot dogs and buns” problem, where reagents are sold in non-integer multiples of each other, leading to leftovers. Many of these reagents are not single-purpose, however, so leftovers could in principle be repurposed in other experiments.