Factors explaining fitness: multiple regression analysis.

In order to identify factors affecting bulk fitness of the influenza A virus mutants, we performed multiple linear regression analyses. The viral fitness determined in the EMPIRIC experiments is correlated with in vitro protein abundance \((R^{2}=0.53,p<10^{-2})\). To look if additional factors explain the observed variance of the fitness, we use a linear regression model using protein abundance, ELISA assay result, and mRNA folding energy dG as explanatory variables. The combination of protein abundance and ELISA data improved the prediction of fitness, \(R^{2}=0.68\), ANOVA comparison of the models yields \(p=0.014\). The best prediction was achieved by a linear model combining all three factors, protein abundance, ELISA, and mRNA dG, yielding \(R^{2}=0.752\), and marginally significant \(p=0.053\) for ANOVA comparison of the 2 vs. 3 factor models. The folding energy of mRNA was defined as the free energy dG of the viral mRNA segment consisting of nucleotides -4 to 37 relative to the HA start codon, as described in (Kudla 2009). Adding a categorical variable representing DNA library comprising a particular mutant did not improve the fit, suggesting the measurements were not affected by batch effect. Likewise, inclusion of additional metrics such as HA vRNA abundance normalized to NA viral RNA, vRNA dG, position of the mutation, etc., did not improve the predictive power of the multiple linear regression model for fitness (data not shown). These results suggest that the fitness of synonymous mutants of IAV is affected by multiple factors, likely having a complex relationship with RNA structure and functional interactions between the viral and host components.


  1. G. Kudla, A. W. Murray, D. Tollervey, J. B. Plotkin. Coding-Sequence Determinants of Gene Expression in Escherichia coli. Science 324, 255–258 American Association for the Advancement of Science (AAAS), 2009. Link