loading page

Machine Learning-based Prediction of Enzyme Substrate Scope: Application to Bacterial Nitrilases
  • +5
  • Zhongyu Mou,
  • Jason Eakes,
  • Connor Cooper,
  • Carmen Foster,
  • Robert Standaert,
  • Mircea Podar,
  • Mitchel Doktycz,
  • Jerry Parks
Zhongyu Mou
Oak Ridge National Laboratory

Corresponding Author:zhongyu.mou@gmail.com

Author Profile
Jason Eakes
Oak Ridge National Laboratory
Author Profile
Connor Cooper
The University of Tennessee Knoxville-Oak Ridge National Laboratory Graduate School of Genome Science and Technology
Author Profile
Carmen Foster
Oak Ridge National Laboratory
Author Profile
Robert Standaert
Oak Ridge National Laboratory
Author Profile
Mircea Podar
Oak Ridge National Laboratory
Author Profile
Mitchel Doktycz
Oak Ridge National Laboratory
Author Profile
Jerry Parks
Oak Ridge National Laboratory
Author Profile


Predicting the range of substrates accepted by an enzyme from its amino acid sequence is challenging. Although sequence- and structure-based annotation approaches are often accurate for predicting broad categories of substrate specificity, they generally cannot predict which specific molecules will be accepted as substrates for a given enzyme, particularly within a class of closely related molecules. Combining targeted experimental activity data with structural modeling, ligand docking, and physicochemical properties of proteins and ligands with various machine learning models provides complementary information that can lead to accurate predictions of substrate scope for related enzymes. Here we describe such an approach that can predict the substrate scope of bacterial nitrilases, which catalyze the hydrolysis of nitrile compounds to the corresponding carboxylic acids and ammonia. Each of the four machine learning models (linear regression, random forest, gradient-boosted decision trees, and support vector machines) performed similarly (average ROC = 0.9, average accuracy = ~82%) for predicting substrate scope for this dataset. The approach is intended to be highly modular with respect to physicochemical property calculations and software used for docking and modeling.
06 May 2020Submitted to PROTEINS: Structure, Function, and Bioinformatics
07 May 2020Submission Checks Completed
07 May 2020Assigned to Editor
08 Jul 2020Reviewer(s) Assigned
25 Aug 2020Review(s) Completed, Editorial Evaluation Pending
28 Aug 2020Editorial Decision: Revise Minor
02 Sep 20201st Revision Received
03 Sep 2020Submission Checks Completed
03 Sep 2020Assigned to Editor
13 Sep 2020Reviewer(s) Assigned
17 Oct 2020Review(s) Completed, Editorial Evaluation Pending
17 Oct 2020Editorial Decision: Accept
10 Nov 2020Published in Proteins: Structure, Function, and Bioinformatics. 10.1002/prot.26019