Comparison of Different Methods for the two numerical Studies.
Note: the numbers in parentheses represent the standard deviations
Study 2
proportion Method Precision Recall/TPR FPR F1 Predicted DEG
5% iDEG 0.957 (1.0\(\times 10^{-2}\)) 0.733 (1.9\(\times 10^{-2}\)) 0.002 (4.7\(\times 10^{-4}\)) 0.83 (1.1\(\times 10^{-2}\)) 766 (26)
edgeR 0.532 (1.1\(\times 10^{-2}\)) 0.935 (7.7\(\times 10^{-3}\)) 0.043 (1.9\(\times 10^{-3}\)) 0.678 (9.0\(\times 10^{-3}\)) 1760 (39)
DESeq 1 (0) 0.07 (3.6\(\times 10^{-2}\)) 0 (0) 0.131 (6.1\(\times 10^{-2}\)) 70.35 (36)
DEGseq 0.102 (9.0\(\times 10^{-4}\)) 0.985 (3.9\(\times 10^{-3}\)) 0.459 (4.4\(\times 10^{-3}\)) 0.184 (1.5\(\times 10^{-3}\)) 9699 (85)
10% iDEG 0.966 (8.2\(\times 10^{-3}\)) 0.78 (1.9\(\times 10^{-2}\)) 0.003 (8.2\(\times 10^{-4}\)) 0.863 (9.7\(\times 10^{-3}\)) 1616 (50)
edgeR 0.639 (8.8\(\times 10^{-3}\)) 0.947 (5.2\(\times 10^{-3}\)) 0.06 (2.3\(\times 10^{-3}\)) 0.763 (6.8\(\times 10^{-3}\)) 2966 (42)
DESeq NA (NA) 0 (0) 0 (0) NA (NA) 0 (0)
DEGseq 0.19 (1.6\(\times 10^{-3}\)) 0.986 (2.8\(\times 10^{-3}\)) 0.468 (4.5\(\times 10^{-3}\)) 0.318 (2.3\(\times 10^{-3}\)) 10394 (80)
15% iDEG 0.969 (5.1\(\times 10^{-3}\)) 0.814 (1.5\(\times 10^{-2}\)) 0.005 (8.3\(\times 10^{-4}\)) 0.884 (7.7\(\times 10^{-3}\)) 2519 (54)
edgeR 0.699 (7.2\(\times 10^{-3}\)) 0.954 (4.1\(\times 10^{-3}\)) 0.073 (2.5\(\times 10^{-3}\)) 0.807 (5.2\(\times 10^{-3}\)) 4098 (44)
DESeq NA (NA) 0 (0) 0 (0) NA (NA) 0 (0)
DEGseq 0.266 (2.1\(\times 10^{-3}\)) 0.987 (2.1\(\times 10^{-3}\)) 0.48 (5.0\(\times 10^{-3}\)) 0.419 (2.6\(\times 10^{-3}\)) 11128 (86)
20% iDEG 0.974 (4.2\(\times 10^{-3}\)) 0.828 (1.5\(\times 10^{-2}\)) 0.006 (1.0\(\times 10^{-3}\)) 0.895 (7.8\(\times 10^{-3}\)) 3402 (74)
edgeR 0.741 (6.0\(\times 10^{-3}\)) 0.96 (3.2\(\times 10^{-3}\)) 0.084 (2.6\(\times 10^{-3}\)) 0.836 (4.1\(\times 10^{-3}\)) 5182 (45)
DESeq NA (NA) 0 (0) 0 (0) NA (NA) 0 (0)
DEGseq 0.333 (2.3\(\times 10^{-3}\)) 0.987 (1.9\(\times 10^{-3}\)) 0.494 (5.0\(\times 10^{-3}\)) 0.498 (2.6\(\times 10^{-3}\)) 11858 (80)
Study 3
proportion Method Precision Recall/TPR FPR F1 Predicted DEG
5% iDEG 0.926 (1.5\(\times 10^{-2}\)) 0.652 (2.2\(\times 10^{-2}\)) 0.003 (6.3\(\times 10^{-4}\)) 0.765 (1.4\(\times 10^{-2}\)) 704 (29)
edgeR 0.305 (6.3\(\times 10^{-3}\)) 0.956 (6.0\(\times 10^{-3}\)) 0.115 (3.4\(\times 10^{-3}\)) 0.463 (7.3\(\times 10^{-3}\)) 3133 (65)
DESeq 0.999 (2.1\(\times 10^{-3}\)) 0.152 (3.8\(\times 10^{-2}\)) 0 (1.8e-05) 0.262 (5.8\(\times 10^{-2}\)) 152 (38)
DEGseq 0.086 (6.7\(\times 10^{-4}\)) 0.985 (3.9\(\times 10^{-3}\)) 0.549 (3.9\(\times 10^{-3}\)) 0.159 (1.2\(\times 10^{-3}\)) 11409 (74)
10% iDEG 0.945 (1.1\(\times 10^{-2}\)) 0.708 (2.2\(\times 10^{-2}\)) 0.005 (1.1\(\times 10^{-3}\)) 0.809 (1.2\(\times 10^{-2}\)) 1500 (59)
edgeR 0.447 (6.2\(\times 10^{-3}\)) 0.96 (4.3\(\times 10^{-3}\)) 0.132 (3.3\(\times 10^{-3}\)) 0.61 (6.0\(\times 10^{-3}\)) 4296 (60)
DESeq 1 (0) 0 (5.2\(\times 10^{-4}\)) 0 (0) 0.002 (1.4\(\times 10^{-3}\)) 1 (1)
DEGseq 0.165 (1.1\(\times 10^{-3}\)) 0.986 (2.5\(\times 10^{-3}\)) 0.556 (4.2\(\times 10^{-3}\)) 0.282 (1.6\(\times 10^{-3}\)) 11975 (76)
15% iDEG 0.953 (7.0\(\times 10^{-3}\)) 0.746 (1.6\(\times 10^{-2}\)) 0.006 (1.1\(\times 10^{-3}\)) 0.837 (9.1\(\times 10^{-3}\)) 2349 (58)
edgeR 0.537 (5.7\(\times 10^{-3}\)) 0.964 (3.7\(\times 10^{-3}\)) 0.147 (3.4\(\times 10^{-3}\)) 0.69 (4.8\(\times 10^{-3}\)) 5384 (59)
DESeq 1 (NA) 0 (3.3e-05) 0 (0) 0.001 (NA) 0 (0)
DEGseq 0.235 (1.4\(\times 10^{-3}\)) 0.986 (2.1\(\times 10^{-3}\)) 0.565 (4.2\(\times 10^{-3}\)) 0.38 (1.9\(\times 10^{-3}\)) 12562 (73)
20% iDEG 0.962 (4.6\(\times 10^{-3}\)) 0.763 (1.3\(\times 10^{-2}\)) 0.008 (1.0\(\times 10^{-3}\)) 0.851 (7.8\(\times 10^{-3}\)) 3175 (64)
edgeR 0.602 (5.7\(\times 10^{-3}\)) 0.966 (2.8\(\times 10^{-3}\)) 0.16 (3.9\(\times 10^{-3}\)) 0.742 (4.4\(\times 10^{-3}\)) 6419 (64)
DESeq NA (NA) 0 (0) 0 (0) NA (NA) 0 (0)
DEGseq 0.299 (1.6\(\times 10^{-3}\)) 0.986 (2.0\(\times 10^{-3}\)) 0.577 (4.2\(\times 10^{-3}\)) 0.459 (1.9\(\times 10^{-3}\)) 13180 (68)

Negative Binomial Distribution (NB) with a Varying Dispersion Parameter

This study assumes that \(Y_{g1}\sim NB(\mu_{g1},\delta_{g})\) and \(Y_{g2}\sim NB(\mu_{g2},\delta_{g})\), where \(\delta_{g}\) is a constant across all genes. Besides these two assumptions, the data is generated by following the same procedure used in Section \ref{sec:num-study-1}. For the dispersion parameter, we set \(\delta_{g}=0.02\) for all \(g=1,\ldots,20000\).
The middle panel of Figure \ref{fig:num-study} compares the \(F_{1}\) scores for all methods. It is clear that iDEG is the best across the entire range of \(p\), followed by edgeR and DEGseq. Since one main assumption in edgeR is constant dispersion, this setting actually favors edgeR. Nonetheless, iDEG still produces the higher \(F_{1}\) scores across \(p\) compared to edgeR. When implementing edgeR, different values for the parameter BCV were tried and 0.1 was found to work the best. Therefore, 0.1 will be set as the default parameter value for the rest of this study. Although DESeq is able to identify some DEGs when \(p\) is small, its performance degrades quickly when \(p\) increases. This is partially due to DESeq treating two samples as replicates, which is improper when larger portions of DEGs are present in the transcriptome. It is observed that the average \(F_{1}\) scores of all the methods are lower than those from Study 1, which may be due to the higher variation associated with the NB distribution.
Study 2 of Table \ref{table:study123} suggests that iDEG works competitively in terms of having a combined high precision and low FPR among all methods. Take \(p=5\%\) as an example. Despite, its lower recall, iDEG has a substantially higher precision (0.957) and lower FPR (0.002) than edgeR (precision = 0.532; FPR = 0.043) and DEGseq (precision = 0.102; FRP = 0.459). DESeq occasionally yields high precision, however, its low recall leads to an overall consistently poor performance. 
In this simulation , the RNA-Seq data is assumed to follow the NB distribution, where the dispersion parameter δg\delta_{g}δg is a function of μg1\mu_{g1}μg1. The simulation procedure is the same as the one described in Section 4.2 except that the dispersion parameter has been adapted to the one used by \citeauthoranders-2010-differ-expres (\citeyearanders-2010-differ-expres)  and set δg=0.005+9/(μg1+100)\delta_{g}=0.005+9/(\mu_{g1}+100)δg=0.005+9/(μg1+100). The bottom panel in Figure ??? suggests that iDEG produces the highest F1F_{1}F1 scores across ppp. Study 3 in Table ??? has a similar pattern as Study 2 in Table ???, suggesting that iDEG has the best overall performance in terms of high precision and low FPR,regardless of whether δg\delta_{g}δg is a constant or a function of expression mean μg\mu_{g}μg.

Unequal Library Sizes

Three numerical studies (Sections \ref{sec:num-study-1}-\ref{sec:num-study-3}) with single-subject, single-sample RNA-Seq data with unequal library sizes (where the library size of one transcriptome is 1.5 times that of the other transcriptome) were also conducted. The results are shown in the Web Figure 2. These results demonstrate that the iDEG can adjust unequal library sizes well and its performance is still superior to existing methods.