Measurements of protein expression and thermal stability for 114 single mutants of a glycoside hydrolase allows evaluation of stability predictions

Author contributions (alphabetical by last name)

  • Dylan Alexander Carlin [2]: molecular cloning, designed experiments, wrote software used in analysis, analyzed data, Rosetta modeling, FoldX modeling, machine learning, wrote paper
  • Ryan Caster [1]: characterized expression for mutants
  • Bill Chan [1]: characterized Tm for mutants, analyzed data, contributed to paper
  • Natalie Damrau [1]: characterized mutants
  • Siena Hapig-Ward [1]: characterized Tm and kinetic constants, analyzed data, drew figures, contributed to paper
  • Mary Riley [1]: characterized mutants
  • Justin B. Siegel [1,3,4]: PI

Author affiliations:

  1. Genome Center, University of California, Davis CA, USA
  2. Biophysics Graduate Group, University of California, Davis CA, USA
  3. Department of Chemistry, University of California, Davis CA, USA
  4. Department of Biochemistry & Molecular Medicine, University of California, Davis CA, USA

Subject areas: biochemistry, computational biology, machine learning

Keywords: enzyme, Rosetta, thermal stability



  • background
  • current data sets and problems gathering them
  • current computational approaches
  • summary of what we did

  • Figure 1: all positions selected in study


  • cloning and mutagenesis
  • production and purification
  • assay and data analysis
  • computational modeling and machine learning


  • summary of all mutants, wild type values, limits of detection
  • mutants less thermostable, mutants more thermostable

  • Figure 2: heatmap with expression, Tm, kcat, km, kcat/km for each mutant

  • Figure 3: drawings of discussed mutants W120F, N404C, H178A, E222H

  • Figure 4: crystal structure with residues colored per change in Tm compared to wild type

  • Rosetta structural features predict expression, but not Tm

  • Figure 5: machine learning model evaluation


  • summary of what we showed and connections to background
  • implications for biotechnology
  • implications for human health
  • conclusion

Supplemental information

  • data table with columns name, expression, tm, kcat, km, kcat/km and errors for those values for which that makes sense
  • gel image of each protein used in study


The importance of enzymes in biotechnology and human disease makes the accurate modeling of enzyme stability an important goal in biochemistry. Previous efforts to build predictive models of enzyme mutants have either suffered from inconsistent data sets, or data sets which are too small to enable evaluation of current predictive methods. Here, we present a large, self-consistent data set of protein soluble expression and melting temperature for a large (114 mutants) library of a family 1 glycoside hydrolase mutants. We then show the use of the experimental data to build a predicitve model of soluble enzyme expression and melting temperature. We show that, while models for predicting melting temperature require largeer data sets to become feasible, we are able to apply our model to a blind test set of 10 mutants, on which we achiveed practivcie acttauct of 0. percenty. We shown that automated molecuyalr biloogy compbined with highpthorughpout open sourced protyocols screning oif glycoside hydrolase nutants prives a path forward in the prediciton of the gunctional effects of single pioint mutants in enzymes. We show how this methodoloy could be used to design novel thermostable mutants. We show how this approach could be used to detect disease-causing missense mutations from gene sequence.



Enzymes' key roles in biotechnology and human disease make the accurate modeling of enzyme stability an important goal of the protein modeling community. Accurate prediction of a point mutation's effect on enzyme stability would unlock rational protein engineering approaches, where the information could be used immediately to rationally engineer an enzyme's functional envelope for a desired situation as has been previously explored (Bornscheuer 2012). Furthermore, understanding the changes in enzyme stability that occur upon point mutations would provide huge insight into understanding inherited diseases of metabolism [cite], cancer [cite], a