Abstract

The importance of enzymes in biotechnology and human disease makes the accurate modeling of enzyme stability an important goal in biochemistry. Previous efforts to build predictive models of enzyme mutants have either suffered from inconsistent data sets, or data sets which are too small to enable evaluation of current predictive methods. Here, we present a large, self-consistent data set of protein soluble expression and melting temperature for a large (114 mutants) library of a family 1 glycoside hydrolase mutants. We then show the use of the experimental data to build a predicitve model of soluble enzyme expression and melting temperature. We show that, while models for predicting melting temperature require largeer data sets to become feasible, we are able to apply our model to a blind test set of 10 mutants, on which we achiveed practivcie acttauct of 0. percenty. We shown that automated molecuyalr biloogy compbined with highpthorughpout open sourced protyocols screning oif glycoside hydrolase nutants prives a path forward in the prediciton of the gunctional effects of single pioint mutants in enzymes. We show how this methodoloy could be used to design novel thermostable mutants. We show how this approach could be used to detect disease-causing missense mutations from gene sequence.