identify data set
come up with quiz questions to identify suitable students
depth vs. replicates
Illumina (mention ABIsolid, nanopore, PacBio...)
paired-end vs. single-end
adapters, contaminations etc.
library batch effects
lane effects: sample loading, cluster amplification, sequencing reaction
Consider spike-in of artificial RNA (ERCC spike-in standard) for calibration of the RNA concentrations in each sample and of the measured fold‐changes between the two conditions (Jiang et al. 2011; Loven et al. 2012)
Recommendations from Schurch et al. (2015) for the design of RNA-seq experiments:
At least 6 replicates per condition for all experiments.
At least 12 replicates per condition for experiments where identifying the majority of all DE genes is important.
For experiments with <12 replicates per condition; use edgeR.
For experiments with >12 replicates per condition; use DESeq.
Apply a fold‐change threshold appropriate to the number of replicates per condition between 0.1 \(\le threshold \le\) 0.5.
Gierliński et al. (2015) show that aberrant replicates can skew the entire analysis as a significant fraction of gene counts cannot be captured by the log-normal or negative binomial distributions any longer. It is therefore important to have enough replicates to a) identify outlier samples and b) be able to remove them without losing too much statistical power.
“Even the best tools have limited statistical power with few replicates in each condition, unless a stringent fold‐change threshold is imposed” (Schurch et al., 2015). The inherent biological noise and gene expression variation sets the lower limit for the fold change of DGE that can be detected. The more genes with low fold changes should be detected as part of the experiment, the more replicates are needed to have sufficient data for the estimation of biological variability.