loading page

Protein Structure Prediction with Expectation Reflection
  • +3
  • Evan Cresswell-Clay,
  • Danh-Tai Hoang,
  • Joe McKenna,
  • Eric Zhang,
  • Vipul Periwal,
  • Chris Yang
Evan Cresswell-Clay
National Institute of Diabetes and Digestive and Kidney Diseases Laboratory of Biological Modeling

Corresponding Author:[email protected]

Author Profile
Danh-Tai Hoang
National Institute of Diabetes and Digestive and Kidney Diseases Laboratory of Biological Modeling
Author Profile
Joe McKenna
National Institute of Diabetes and Digestive and Kidney Diseases Laboratory of Biological Modeling
Author Profile
Eric Zhang
National Institute of Diabetes and Digestive and Kidney Diseases Laboratory of Biological Modeling
Author Profile
Vipul Periwal
National Institute of Diabetes and Digestive and Kidney Diseases Laboratory of Biological Modeling
Author Profile
Chris Yang
National Institute of Diabetes and Digestive and Kidney Diseases Laboratory of Biological Modeling
Author Profile

Abstract

Sequence covariation in multiple sequence alignments of homologous proteins has been used extensively to obtain insights into protein structure. However, global statistical inference is required in order to ascertain direct relationships between amino acid positions in these sequences that are not simply secondary correlations induced by interactions with a third residue. Methods for statistical inference of such covariation have been developed to exploit the growing availability of sequence data. These hints about the folded protein structure provide critical a priori information for more detailed 3D predictions by neural networks. We present a novel method for protein structure inference using an iterative parameter-free model estimator which uses the formalism of statistical physics. With no tunable learning rate, our method scales to large system sizes while providing improved performance in the regime of small sample sizes. We apply this method to 40974 PDB structures and compare its performance to that of other methods. Our method outperforms existing methods for 76% of analysed proteins.