TNBC example (RNA-Seq single-subject dataset). The expression of the first ten genes in alphabetical order among 20,501 gene expression measurements, which are mapped to gene symbols for both tumor and surrounding healthy tissue collected from an African American female (subject TCGA-GI-A2C9) exhibiting TNBC. The second and third column display the mRNA counts of her healthy sample and the ones of her tumor sample. The last three columns - Absolute Difference, Fold Change (FC), and indicator of FC \(\geq\) 3 or \(\frac{1}{FC}\geq 3\) - illustrate the general complexity of working with count data and the caution one must proceed with when developing methods for both lowly and highly expressed genes. Using a simple heuristic of FC \(\geq\) 3 or \(\frac{1}{FC}\geq 3\) to label a DEG, we see two potential extreme cases of misclassifying a gene by assuming that genes of different orders of magnitude present the same behavior. Gene A2M, for example, has an absolute difference of 17,560 and \(\frac{1}{FC}=2.5\), which could be a potential prime candidate for a down-regulated DEG. Even though A4GNT has a FC=5, it may not be a DEG since there tends to be more noise than signal at such low levels of expression. Note, single-subject RNA-Seq analysis compare isogenic tissues of the same subject, and isogenic refers to identical genomes as in tissues of the same subject, cell lines, or highly inbred animal models (e.g., mice strains), while heterogenic conditions are observed between individuals with distinct genomes (e.g., most human beings).
Gene |
Healthy |
Tumor |
Absolute Difference |
Fold Change(FC) |
FC \(\geq\) 3 or \(\frac{1}{FC}\geq\) 3 |
A1BG |
72 |
92 |
20 |
1.28 |
0 |
A1CF |
0 |
1 |
1 |
NaN |
NA |
A2BP1 |
2 |
0 |
2 |
0 |
NA |
A2LD1 |
71 |
127 |
56 |
1.79 |
0 |
A2ML1 |
12 |
773 |
761 |
64.42 |
1 |
A2M |
29385 |
11825 |
17560 |
0.4 |
0 |
A4GALT |
891 |
871 |
20 |
0.98 |
0 |
A4GNT |
5 |
1 |
4 |
0.2 |
1 |
AAA1 |
0 |
0 |
0 |
NaN |
NA |
AAAS |
460 |
414 |
46 |
0.9 |
0 |
Legend. NaN:not defined. NA: not applicable.