ProCS15: A DFT-based chemical shift predictor for backbone and C\(\beta\) atoms in proteins

Anders S. Larsen,\(^{a,c}\) Lars A. Bratholm,\(^{c}\) Anders S. Christensen,\(^b\) Maher Channir, and Jan H. Jensen*

Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen, Denmark

\(^a\)Current address: Pharmaceutical Technology and Engineering, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark

\(^b\)Current address: Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, WI 53706, USA

\(^c\)The authors contributed equally to this work




We present ProCS15: A program that computes the isotropic chemical shielding values of backbone and C\(\beta\) atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values below 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms respectively. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each protein. The maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7 - 1.1, 0.2 - 0.4, and 1.8 - 2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively.


Chemical shifts hold valuable structural information that is being used more and more in the determination and refinement of protein structures and dynamics (Mulder 2010, Raman 2010, Lange 2012, Bratholm 2015, Robustelli 2010) with the aid of empirical shift predictors such as CamShift (Kohlhoff 2009), Sparta+ (Shen 2010), ShiftX2 (Han 2011), PPM_One (Li 2015) and shAIC (Nielsen 2012). These methods are typically based on approximate physical models with adjustable parameters that are optimized by minimizing the discrepancy between experimental and predicted chemical shifts computed using protein structures derived from x-ray crystallography. The agreement with experiment is quite remarkable with RMSD values around 1, 0.3, and 2 ppm for carbon, hydrogen, and nitrogen atoms. Chemical shift predictions based on quantum mechanical (QM) calculations (mostly density functional theory, DFT) are becoming increasingly feasible for small proteins (Zhu 2012, Zhu 2013, Exner 2012, Sumowski 2014, Swails 2015) and Vila, Scheraga and co-workers have gone on to develop a DFT-based chemical shift predictor for C\(\alpha\) and C\(\beta\) atoms called CheShift-2 (Martin 2013). Generally, these QM-based methods yield chemical shifts that deviate significantly more from experiment than the empirical methods, with RMSD values that generally are at least twice as large. However, many of these studies have also shown that the empirical methods are less sensitive to the details of the protein geometry and that QM-based chemical shift predictors may be more suitable for protein refinement (Parker 2006, Sumowski 2014, Vila 2009, Christensen 2013).

Some of us recently showed (Christensen 2013) that protein refinement using a DFT-based backbone amide proton chemical shift predictor (ProCS) yielded more accurate hydrogen-bond geometries and \(^\text{3h}\)J\(_\text{NC'}\) coupling constants involving backbone amide groups than corresponding refinement with CamShift. Furthermore, the ProCS predictions based on the structurally refined ensemble yielded amide proton chemical shift predictions that were at least as accurate as CamShift. This suggests that the larger RMSD observed for QM-based chemical shift predictions may, at least in part, be due to relatively small errors in the protein structures used for the predictions, and not a deficiency in the underlying method. However, in order to test whether this is true in general we need to include the effect of more than one type of chemical shift in the structural refinement. In this study we extend ProCS to the prediction of chemical shifts of backbone and C\(\beta\) atoms in a new method we call ProCS15. We describe the underlying theory, which is significantly different from the previous, amide proton-only, version of ProCS (hence the new name) and test the accuracy relative to full DFT calculations as well as experiment for Ubiquitin and the third IgG-binding domain of Protein G (GB3). We also compare the accuracy to CheShift-2 and other commonly used empirical chemical shift predictors using both single structures and NMR-derived ensembles for Ubiquitin.


ProCS15 computes the chemical shift of an atom in residue \(i\) by \[\label{eqn:scaling} \delta^i = b-a\sigma^i\] where \(a\) and \(b\) are empirically determined parameters as discussed further below and \(\sigma^i\) is the isotropic chemical shielding of an atom in residue \(i\). \(\sigma^i\) is computed from the protein structure using the following equation (some of these terms only contribute for certain atom types as described below) \[\label{eqn:procs} \sigma^i=\sigma^i_{BB}+\Delta\sigma^{i-1}_{BB}+\Delta\sigma^{i+1}_{BB}+\Delta\sigma^{i}_{HB}+\Delta\sigma^{i}_{H\alpha B}+\Delta\sigma^{i}_{RC}+\Delta\sigma^{i}_{w}\] Here \(\sigma^i_{BB}=\sigma^i_{BB}(\phi^i,\psi^i,\chi^i_1,\chi^i_2,...)\) is the chemical shielding computed for an Ac-AXA-NMe tripeptide (AXA for short, Figure \ref{fig:bb}), where X is residue \(i\), for a given combination of \(\phi\), \(\psi\), and \(\chi_1\), \(\chi_2\), ... values as described further in Section \ref{subsec:bbscan}. \(\Delta\sigma^{i-1}_{BB}\) is the change in chemical shielding of an atom in residue \(i\) due to the presence of the side-chain of residue \(i-1\). It is computed as \[\label{eqn:sigmabb} \Delta\sigma^{i-1}_{BB}=\sigma^{i-1}_{BB}(\phi^{i-1},\psi^{i-1},\chi^{i-1}_1,\chi^{i-1}_2,...)-\sigma^A(\phi_{\mathrm{std}},\psi_{\mathrm{std}})\] Here \(\sigma^{i-1}_{BB}\) is the chemical shielding computed for an AXA tripeptide where X is residue \(i-1\), and \(\sigma^A\) is from the corresponding calculation on the AAA tripeptide but using \(\phi_{\mathrm{std}}\) = -120° and \(\psi_{\mathrm{std}}\) = 140° for all \(\phi\) and \(\psi\) angles. For example, if residue \(i\) is a Ser and residue \(i-1\) is a Val then the effect of the Val side-chain on the C\(\beta\) chemical shielding of the Ser residue is computed as the difference in the chemical shielding of the C\(\beta\) atom in the C-terminal Ala residue computed for an AVA and AAA tripeptide. This approach assumes that the effect of the \(i-1\) side chain on the chemical shielding values of the atoms in residue \(i\) are independent of the conformations \(\phi_i\) and \(\psi_i\) angles and the nature of residue \(i\). \(\sigma^{i+1}_{BB}\) is the corresponding change in chemical shielding of an atom in residue \(i\) due to the presence of the side-chain of residue \(i+1\).