Discussion
By focusing on one patient at a time in which each subject serves as his/her own control, single-subject analyses, including the one we propose, have the potential to ascertain meaningful biomolecular mechanisms for decision-making in precision medicine \cite{gardeux2017genome}. However, the prohibitive cost and access to clinical tissue in a single subject undermines the replication requirements of conventional statistical methods. In this work, we introduce a novel and powerful method for identifying DEGs based on only two transcriptomes for a single subject (case vs. baseline transcriptome). The core idea is the application of variance-stabilizing transformation (VST), which effectively solves the single-subject, single-sample problem and makes it possible to “borrow strength” across genes. Through simulation studies and a clinical dataset analysis, it was demonstrated that iDEG has a high accuracy of discovery even when gene expression counts are over-dispersed.
While the simulations demonstrate that iDEG presents increased accuracy at both precision (positive predictive value) and recall (sensitivity) over other methods, there are some caveats and potential extensions. First, iDEG strives to mine the most information from limited data; however, we need to keep in mind that no statistical inferences can replace data
\cite{hansen-2011-sequen-techn}, and that replication is still preferable if the tissue is available and the associated cost is reasonable. Second, the application of iDEG is not restricted to RNA-Seq data but also applicable to count data in general, such as immunoprecipitated DNA
\cite{ross-innes-2012-differ-oestr} (e.g., ChIP-Seq), proteomic spectral counts \cite{johnson-2012-proteom-analy}, protein antibody arrays, or metagenomics data, that follow Poisson or Negative Binomial distribution with the parallel structure. An important extension of iDEG can be made by incorporating suitable variance-stabilizing techniques that are suitable for high-throughput data following other distributions. Another valuable extension would be the incorporation of external knowledge, such as a gene ontology, to define a set of genes and aggregate gene-level metrics to a gene set \cite{li-2017-kmen,schissler-2015-dynamic,li-2017-MixEnrich}. Lastly, future single-subject experiments may study more than two conditions beyond the current Case-vs.-Baseline design; therefore, it would be interesting to extend iDEG to identify DEGs under multiple conditions or multiple ’omics measures.