Figure 4. Performance of rwTTD prediction across heterogeneous
populations. a. Performance of different test set termination rates,
when the training set is at 0.0008 termination rate. b. Performance of
different training set examples, when the number of test set examples is
fixed at 5000. c. Performance of different test set noise levels, when
the training set noise level is 0.1. d. Performance of different test
set feature scales when the training set feature scale is 1.
The other factors affected little on the performance. When the training
set and test set were drawn from the same population, when increasing
the number of training examples, the performance steadily improves,
while the number of testing examples mainly affects the breadth of the
performance (Fig. 4b, Fig. S8-9 ). Noise level on individual
features does not affect overall performance on population-wise rwTTD
(Fig. 4c, Fig. S10-11 ). We then altered the scaling factor of
the features. This alteration would result in feature values distributed
at different scales, and thus addressing record disparities across
cohorts. As expected, when the training and testing feature scales are
similar, the model showed relatively low errors. As the two
distributions deviate, the percentage of error increases. However, even
when the training set feature scale is 1, and the test set feature scale
is 1000, the overall population error was moderate (0.13481 for both
metrics) (Fig. 4d, Fig. S12-13 ). The above results point to a
stable performance of the model across two distinct populations against
a variety of factors.