WORKING DRAFT authorea.com/37198

Abstract

# Abstract

We present ProCS15: A program that computes the isotropic chemical shielding values of backbone and C$$\beta$$ atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values below 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms respectively. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each protein. The maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7 - 1.1, 0.2 - 0.4, and 1.8 - 2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively.

# Introduction

Chemical shifts hold valuable structural information that is being used more and more in the determination and refinement of protein structures and dynamics (Mulder 2010, Raman 2010, Lange 2012, Bratholm 2015, Robustelli 2010) with the aid of empirical shift predictors such as CamShift (Kohlhoff 2009), Sparta+ (Shen 2010), ShiftX2 (Han 2011), PPM_One (Li 2015) and shAIC (Nielsen 2012). These methods are typically based on approximate physical models with adjustable parameters that are optimized by minimizing the discrepancy between experimental and predicted chemical shifts computed using protein structures derived from x-ray crystallography. The agreement with experiment is quite remarkable with RMSD values around 1, 0.3, and 2 ppm for carbon, hydrogen, and nitrogen atoms. Chemical shift predictions based on quantum mechanical (QM) calculations (mostly density functional theory, DFT) are becoming increasingly feasible for small proteins (Zhu 2012, Zhu 2013, Exner 2012, Sumowski 2014, Swails 2015) and Vila, Scheraga and co-workers have gone on to develop a DFT-based chemical shift predictor for C$$\alpha$$ and C$$\beta$$ atoms called CheShift-2 (Martin 2013). Generally, these QM-based methods yield chemical shifts that deviate significantly more from experiment than the empirical methods, with RMSD values that generally are at least twice as large. However, many of these studies have also shown that the empirical methods are less sensitive to the details of the protein geometry and that QM-based chemical shift predictors may be more suitable for protein refinement (Parker 2006, Sumowski 2014, Vila 2009, Christensen 2013).

Some of us recently showed (Christensen 2013) that protein refinement using a DFT-based backbone amide proton chemical shift predictor (ProCS) yielded more accurate hydrogen-bond geometries and $$^\text{3h}$$J$$_\text{NC'}$$ coupling constants involving backbone amide groups than corresponding refinement with CamShift. Furthermore, the ProCS predictions based on the structurally refined ensemble yielded amide proton chemical shift predictions that were at least as accurate as CamShift. This suggests that the larger RMSD observed for QM-based chemical shift predictions may, at least in part, be due to relatively small errors in the protein structures used for the predictions, and not a deficiency in the underlying method. However, in order to test whether this is true in general we need to include the effect of more than one type of chemical shift in the structural refinement. In this study we extend ProCS to the prediction of chemical shifts of backbone and C$$\beta$$ atoms in a new method we call ProCS15. We describe the underlying theory, which is significantly different from the previous, amide proton-only, version of ProCS (hence the new name) and test the accuracy relative to full DFT calculations as well as experiment for Ubiquitin and the third IgG-binding domain of Protein G (GB3). We also compare the accuracy to CheShift-2 and other commonly used empirical chemical shift predictors using both single structures and NMR-derived ensembles for Ubiquitin.