Generation of Rosetta structural features from molecular models and machine learning

The Rosetta Molecular Modeling suite was used (code in Supplemental) to generate 100 molecular models of each mutant protein. The lowest 10 models based on total system energy score were used to calculate a set of 50 structural features, and these features were averaged to give 50 features per mutant protein. The 50 features were normalized by subtracting the mean and dividing by the variance.

Protein expression based on SDS-PAGE was assessed (2 biological replicates per protein sample) and a 0 was assigned to proteins that did not express solubly, and a 1 was assigned to expressed proteins. A classifier based on support vector machines (SVM classifier) was trained on the features for the mutants with experimental data (scikit-learn). We used 10-fold cross validation repeated 1000 times with the results averaged over the 1000 trials. Figure 4 shows the receiver operating characteristic (ROC) curve for the cross-validated predictions.