Results
OPLS models were built to predict cell growth or mAb expression as a
function of amino acid stoichiometric balances. Since cell growth was
measured as viable cell density (VCD) throughout the culture, CVC was
calculated to quantify the total cells that were present within a given
time interval. VCD profiles of CHO cell culture typically follow four
growth phases, namely, lag phase, log phase, stationary phase, and death
phase (Dutton, Scharer, & Moo-Young, 2006). Similarly, in the training
dataset, VCD profiles of the 25 batches also reflected the same four
phases but varied drastically in peak cell densities (black lines in
Fig. 1a). Peak VCD was observed at day 9 for majority of the training
batches with the exception of a few that peaked near day 7 representing
those batches that received a lower percentage of the high nutrient
feed. In contrast, CVC profiles increase over time as they represented a
growing sum of total in the culture (green lines in Fig. 1a).
Intuitively, modeling the time at which peak VCD was reach or the end of
the log phase would highlight the time-dependent contributions by amino
acids from the start of the culture which would produce a model with
narrower utility. Since >75% of the training runs showed
day 9 as peak VCD and the remaining runs also achieve highest total
cells on day 9, the CVC at day 9 was chosen as the response variable for
the growth model. A similar OPLS model was created to predict titer but
in contrast to VCD, mAb titer was measured as the cumulative total
concentration of mAb within the culture at any given time. Since peak
mAb titer was observed by the end of all cultures, day 14 titer was
chosen as the response variable for the production model (Fig. 1b).
To measure the variability and generalization of a model prediction, a
relatively large distribution space was required in the training
dataset. Accordingly, the amino acid SBs from the 25 training batches in
a BLM format were analyzed by PCA. Each observation or score of the PCA
model in BLM format, which represented a single batch, was graphically
analyzed in a score plot and the 375 amino acid SBs were analyzed in a
loadings plot to identify any collinearity or dependencies of the
variables within the datasets that could possibly bias the prediction
(Fig. 1c and Fig. 1d). The PCA model explained 38.8% variance in the in
the first component and 14.7% variance in the 2ndcomponent (Fig. 1c). Although only two components are graphed, a
5-component model was built to ensure greater than 70% of variance was
captured to represent the majority of the dataset. However, based on the
first two components alone, the distribution of the batches showed a
random dispersion and lack of any specific clustering. In addition, the
variable loadings plot did not highlight any time-dependent grouping,
suggesting a minimal collinearity or internal bias within the training
dataset in terms of amino acid SBs (Fig. 1d). The generalized
distribution of the variables provided a strong potential for the OPLS
model to learn and predict across a varying space for future batches.
The reliability of the PCA model was further justified by the criteria
to remove any outliers that could cause internal biases. Accordingly, a
95% Hotelling’s T2 ellipse was provided as a
confidence interval around the dataset to identify any batches that
deviated from the majority (Rencher, 1993).
To ensure a strong fit for both models without a significant loss of
predictive power, the OPLS model for day 9 CVC was built with 1
predictive component and 9 orthogonal components resulting in a
R2 of 0.912 and Q2 of 0.726
(Supplementary Fig. S1a). The requirement of additional orthogonal
components was further reflected by the diverse spread of CVC profiles
throughout the 25 training batches ranging between 25E6 – 45E6
cell-days per mL at day 9. Similarly, there was a large distribution of
day 14 titer ranging from 0.2 to 1.05 relative titer values (Fig. 2b).
Accordingly, the OPLS model for day 14 Titer had 1 predictive component
and 6 orthogonal components with R2 of 0.832 but a
Q2 of 0.422. Although the predictive power of the
production model to generalize to diverse future batches was not as
strong as that of the growth model, it was able to highlight information
on key variables for media optimization (Supplementary Fig. S1b).