Introduction
Single-subject Analysis for RNA-Seq Data
Precision medicine, also known as personalized medicine or individualized medicine, aims to deliver “the right treatments, at the right time, every time to the right person” \cite{kaiser-2015-obama-gives}. The conventional one-size-fits-all drug development approach to the heterogeneous human population has led to the ten top-grossing USA drugs being ineffective for more than 75% of users \cite{schork-2015-person-medic}.
In contrast, precision medicine tailors the optimal treatment to individuals, and clinical trial designs are moving from cohort-based to single-subject trials \cite{schork-2015-person-medic}. The success of precision medicine hinges on identifying personal disease mechanisms \cite{topol-2014-indiv-medic} to optimize disease treatment regimens based on an individual’s biology (e.g., response to stimuli, genomic profile, and baseline risk among other factors).
Single-subject RNA sequencing (RNA-Seq) analysis considers one patient at a time, with the goal of revealing the altered transcriptomic mechanisms, for example, those associated to a disease state of this patient. Compared to traditional cohort-based analysis, the major challenge of single-subject RNA-Seq analysis is estimating the variance of gene expression levels when there are no replicates for each subject - i.e., the single-subject, single-transcriptome per condition. Variance estimation is a central question in RNA-Seq analysis, as it plays a key role in identifying altered mechanisms such as differentially expressed genes (DEGs). Conventional statistics estimate the variance from biological or technical replicates. However, obtaining transcriptome replicates is difficult due to (i) limited tissue availability, (ii) the risks associated with invasive tissue-sampling procedures, and (iii) general costs and inefficiencies with the current technology. As access to replicates in single subjects is compromised for the above-mentioned reasons and in order to advance precision medicine, the field requires novel methods designed to handle single-subject transcriptome analyses.
A Motivating Study
Breast cancer is one of the most common cancers with \(\sim\)500,000 deaths worldwide each year \cite{wild2014world}. Cohort-based analyses have yielded valuable insights into providing personalized treatments by classifying breast cancers into four major subtypes \cite{sorlie-2003-repeated}. However, no two cancers are alike, as significant heterogeneity is present within each subtype \cite{koboldt-2012-compr-molec}. Furthermore, minorities are underrepresented in most clinical trials, and, therefore, knowledge derived from such clinical trials may not be applicable to diverse populations. For example, triple negative breast cancer (TNBC) is a subtype of breast cancer that has poor prognosis and considerable heterogeneity, as well as disproportionately affects women from African origin \cite{dietze-2015-tripl-negat}. In The Cancer Genome Atlas (TCGA) project, which collected RNA-Seq data on 1092 breast cancer patients, matched tumor/healthy samples were available from only two African American (AA) patients who differed remarkably in age, stage of tumor, survival, and other key features. In this case, single-subject RNA-Seq analysis would be more appropriate for discovering individual-specific DEGs and for identifying the best therapeutic options. Our study is specifically motivated by this single-subject RNA-Seq dataset downloaded from TCGA (Table \ref{table:TNBC-example-data}).