TNBC example (RNA-Seq single-subject dataset). The expression of the first ten genes in alphabetical order among 20,501 gene expression measurements, which are mapped to gene symbols for both tumor and surrounding healthy tissue collected from an African American female (subject TCGA-GI-A2C9) exhibiting TNBC. The second and third column display the mRNA counts of her healthy sample and the ones of her tumor sample. The last three columns - Absolute Difference, Fold Change (FC), and indicator of FC \(\geq\) 3 or \(\frac{1}{FC}\geq 3\) - illustrate the general complexity of working with count data and the caution one must proceed with when developing methods for both lowly and highly expressed genes. Using a simple heuristic of FC \(\geq\) 3 or \(\frac{1}{FC}\geq 3\) to label a DEG, we see two potential extreme cases of misclassifying a gene by assuming that genes of different orders of magnitude present the same behavior. Gene A2M, for example, has an absolute difference of 17,560 and \(\frac{1}{FC}=2.5\), which could be a potential prime candidate for a down-regulated DEG. Even though A4GNT has a FC=5, it may not be a DEG since there tends to be more noise than signal at such low levels of expression. Note, single-subject RNA-Seq analysis compare isogenic tissues of the same subject, and isogenic refers to identical genomes as in tissues of the same subject, cell lines, or highly inbred animal models (e.g., mice strains), while heterogenic conditions are observed between individuals with distinct genomes (e.g., most human beings).
Gene Healthy Tumor Absolute Difference Fold Change(FC) FC \(\geq\) 3 or \(\frac{1}{FC}\geq\) 3
A1BG 72 92 20 1.28 0
A1CF 0 1 1 NaN NA
A2BP1 2 0 2 0 NA
A2LD1 71 127 56 1.79 0
A2ML1 12 773 761 64.42 1
A2M 29385 11825 17560 0.4 0
A4GALT 891 871 20 0.98 0
A4GNT 5 1 4 0.2 1
AAA1 0 0 0 NaN NA
AAAS 460 414 46 0.9 0
Legend. NaN:not defined. NA: not applicable.