Methods

Protein production

The BglB gene was codon-optimized for E. coli and cloned into a pET29b+ vector using Gibson assembly. Kunkel mutagenesis was used to generate mutations to BglB via the Transcriptic cloud laboratory platform. Sequence-verified plasmids were transformed into chemically-competent E. coli BLR cells. Single colonies from selection plates were inoculated into 5 mL of Terrific Broth (Fisher) in 50 mL Falcon tubes and incubated at 37 C with shaking at 300 RPM for 24 hours. Cells were pelleted by centrifugation and media was exchanged with Terrific Broth containing 1 mM IPTG. Following incubation at 18 C for 24 hours with shaking at 300 RPM, cells were lysed using BugBuster protein extraction reagent (AMD/Millipore). Cell lysate was clarified by centrifugation at 14,700 RPM for 30 minutes.

Protein purification

Supernatant containing soluble proteins was transfered to protein microcolumns containing 100 µL Ni-NTA resin (HisPur, Thermo) that had been previously washed with wash buffer (150 mM HEPES, 150 mM NaCl, 10 mM imidizole, pH 7.50) After washing with 6 column volumes of wash buffer, proteins were eluted in 300 µL of wash buffer containing 25 mM EDTA. Protein yield was assessed via A280 and confirmed using 4-12% gradient PA gels (Life Technologies).

Activity assay using hydrolysis of pNPG

The activity of the proteins was analyzed using a UV/vis spectrophotometric assay of p-nitrophenyl-β-D-glucosideas hydrolysis. Each mutant was assayed in technical triplicate. To determine the thermal stability of each protein, 50 µL aliquots of the technical replicates (with approximate concentration 0.1 mg/mL, 10-fold diluted from eluate) were incubated in a BioRad Thermal Cycler in 96-well PCR plates using a gradient of 30 to 50 degrees C. Next, 25 µL of the thermal-cycled protein was transferred onto an assay plate containing 75 µL of 100 mM pNPG. Absorbance was monitored at 420 nm every minute for 60 minutes and the rate of 4-nitrophenol production in M/min was calculated using a standard curve.

[Include notebook, master sheet, and assay data as CSV]

Data analysis

Rates of 4NP formation at each of 8 temperatures from 30 C to 50 C were fit to the logistic equation $$v = \frac{1}{1+e^{-k*(x-x0)}}$$ to determine the Tm of each protein.

Rosetta modeling of $\DeltaG$

The Rosetta application ddg_monomer was used to calculate the predicted thermal stability of each point mutations (see attached Code S1 for the Rosetta methodology used).

Machine learning

Rosetta was used in protocols that recapitulated previously successful enzyme design efforts, where atomic-level detail is of utmost importance to calculate 50 structural features describing the effect of the mutation on the protien structure. These metrics include physical energy terms for the protein side chains that act as catalytic residues as well as the ligand.

These metrics were keyed to the experimental effect of the mutation in a table, and the data was used to train a machine learning algorithm. Specifically, we used a technique known as elastic net regularization, a constrained regression approach that penalizes model complexity.