Introduction
Real world Time on Treatment (rwToT), also known as real world time to
treatment discontinuation (rwTTD), is defined as the length of time
observed in real world data (as distinct from controlled clinical
trials) from initiation of a medication to discontinuation of that
medication1,2.
The ending of the treatment can be caused by adverse events, deaths,
switches of treatment and loss of follow up. Because time to treatment
discontinuation can be readily obtained from electronic medical records,
this effectiveness endpoint is convenient to evaluate the efficacy of a
drug that is already approved for public use3. It is
often used as a surrogate effectiveness endpoint, showing high
correlation to progression-free survival and moderate-to-high
correlation to overall survival4,5.
As rwTTD is an important metric for drug effectiveness, it is routinely
reported during the post-clinical trial phase2,4,6–9.
Calculation of rwTTD in patient population is often equivalent to
constructing a (Kaplan-Meier) KM curve, with each point representing the
proportion of patients that are still on treatment at a specific time
point 1.
Either the entire curve, or mean rwTTD, restricted mean10, or the
time point at which a specific portion of the patients (e.g. ,
50%) dropping treatment is of interest. Currently, there is no existing
machine learning scheme established to predict such a curve, or the
midpoint, as the vast majority of the machine learning models have been
focused on predicting individuals’ behavior rather than population-level
behavior. Such a machine learning scheme, if established, has many
meaningful clinical applications. For instance, given observed clinical
parameters and outcomes in clinical trials, how do we derive expected
time-to-treatment in the real-world? Given the rwTTD for a drug on one
patient population, how can we predict the rwTTD when applying this drug
to another population (e.g. , for a different disease)?
This study establishes a machine learning framework to infer
population-wise rwTTD. We showed that population-wise curve prediction
differs substantially from aggregating all individuals’ results. Our
framework models the population-wise curve and is generic to diverse
base-learners for predicting rwTTD. We demonstrated the effectiveness of
this framework based on both simulated data and real world Electronic
medical records (EMR) data for pembrolizumab-treated cancer populations7,11,12.
The study opens a new direction of modeling population-level rwTTD,
which has great values for directing post-clinical stage drug
administrations. This machine learning scheme will also have meaningful
implications to population-based predictions for other problems, as
machine learning algorithms have so far been focused on predictions for
individual samples.