Abstract

# Abstract

We present ProCS15: A program that computes the isotropic chemical shielding values of backbone and C$$\beta$$ atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values below 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms respectively. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each protein. The maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7 - 1.1, 0.2 - 0.4, and 1.8 - 2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively.

# Introduction

Chemical shifts hold valuable structural information that is being used more and more in the determination and refinement of protein structures and dynamics (Mulder 2010, Raman 2010, Lange 2012, Bratholm 2015, Robustelli 2010) with the aid of empirical shift predictors such as CamShift (Kohlhoff 2009), Sparta+ (Shen 2010), ShiftX2 (Han 2011), PPM_One (Li 2015) and shAIC (Nielsen 2012). These methods are typically based on approximate physical models with adjustable parameters that are optimized by minimizing the discrepancy between experimental and predicted chemical shifts computed using protein structures derived from x-ray crystallography. The agreement with experiment is quite remarkable with RMSD values around 1, 0.3, and 2 ppm for carbon, hydrogen, and nitrogen atoms. Chemical shift predictions based on quantum mechanical (QM) calculations (mostly density functional theory, DFT) are becoming increasingly feasible for small proteins (Zhu 2012, Zhu 2013, Exner 2012, Sumowski 2014, Swails 2015) and Vila, Scheraga and co-workers have gone on to develop a DFT-based chemical shift predictor for C$$\alpha$$ and C$$\beta$$ atoms called CheShift-2 (Martin 2013). Generally, these QM-based methods yield chemical shifts that deviate significantly more from experiment than the empirical methods, with RMSD values that generally are at least twice as large. However, many of these studies have also shown that the empirical methods are less sensitive to the details of the protein geometry and that QM-based chemical shift predictors may be more suitable for protein refinement (Parker 2006, Sumowski 2014, Vila 2009, Christensen 2013).

Some of us recently showed (Christensen 2013) that protein refinement using a DFT-based backbone amide proton chemical shift predictor (ProCS) yielded more accurate hydrogen-bond geometries and $$^\text{3h}$$J$$_\text{NC'}$$ coupling constants involving backbone amide groups than corresponding refinement with CamShift. Furthermore, the ProCS predictions based on the structurally refined ensemble yielded amide proton chemical shift predictions that were at least as accurate as CamShift. This suggests that the larger RMSD observed for QM-based chemical shift predictions may, at least in part, be due to relatively small errors in the protein structures used for the predictions, and not a deficiency in the underlying method. However, in order to test whether this is true in general we need to include the effect of more than one type of chemical shift in the structural refinement. In this study we extend ProCS to the prediction of chemical shifts of backbone and C$$\beta$$ atoms in a new method we call ProCS15. We describe the underlying theory, which is significantly different from the previous, amide proton-only, version of ProCS (hence the new name) and test the accuracy relative to full DFT calculations as well as experiment for Ubiquitin and the third IgG-binding domain of Protein G (GB3). We also compare the accuracy to CheShift-2 and other commonly used empirical chemical shift predictors using both single structures and NMR-derived ensembles for Ubiquitin.

# Theory

ProCS15 computes the chemical shift of an atom in residue $$i$$ by $\label{eqn:scaling} \delta^i = b-a\sigma^i$ where $$a$$ and $$b$$ are empirically determined parameters as discussed further below and $$\sigma^i$$ is the isotropic chemical shielding of an atom in residue $$i$$. $$\sigma^i$$ is computed from the protein structure using the following equation (some of these terms only contribute for certain atom types as described below) $\label{eqn:procs} \sigma^i=\sigma^i_{BB}+\Delta\sigma^{i-1}_{BB}+\Delta\sigma^{i+1}_{BB}+\Delta\sigma^{i}_{HB}+\Delta\sigma^{i}_{H\alpha B}+\Delta\sigma^{i}_{RC}+\Delta\sigma^{i}_{w}$ Here $$\sigma^i_{BB}=\sigma^i_{BB}(\phi^i,\psi^i,\chi^i_1,\chi^i_2,...)$$ is the chemical shielding computed for an Ac-AXA-NMe tripeptide (AXA for short, Figure \ref{fig:bb}), where X is residue $$i$$, for a given combination of $$\phi$$, $$\psi$$, and $$\chi_1$$, $$\chi_2$$, ... values as described further in Section \ref{subsec:bbscan}. $$\Delta\sigma^{i-1}_{BB}$$ is the change in chemical shielding of an atom in residue $$i$$ due to the presence of the side-chain of residue $$i-1$$. It is computed as $\label{eqn:sigmabb} \Delta\sigma^{i-1}_{BB}=\sigma^{i-1}_{BB}(\phi^{i-1},\psi^{i-1},\chi^{i-1}_1,\chi^{i-1}_2,...)-\sigma^A(\phi_{\mathrm{std}},\psi_{\mathrm{std}})$ Here $$\sigma^{i-1}_{BB}$$ is the chemical shielding computed for an AXA tripeptide where X is residue $$i-1$$, and $$\sigma^A$$ is from the corresponding calculation on the AAA tripeptide but using $$\phi_{\mathrm{std}}$$ = -120° and $$\psi_{\mathrm{std}}$$ = 140° for all $$\phi$$ and $$\psi$$ angles. For example, if residue $$i$$ is a Ser and residue $$i-1$$ is a Val then the effect of the Val side-chain on the C$$\beta$$ chemical shielding of the Ser residue is computed as the difference in the chemical shielding of the C$$\beta$$ atom in the C-terminal Ala residue computed for an AVA and AAA tripeptide. This approach assumes that the effect of the $$i-1$$ side chain on the chemical shielding values of the atoms in residue $$i$$ are independent of the conformations $$\phi_i$$ and $$\psi_i$$ angles and the nature of residue $$i$$. $$\sigma^{i+1}_{BB}$$ is the corresponding change in chemical shielding of an atom in residue $$i$$ due to the presence of the side-chain of residue $$i+1$$.

\label{fig:bb} Example of the Ac-AXA-NMe tripeptides (for the case where X = Ser) used to compute the backbone contributions to the chemical shielding values.

$$\Delta\sigma^i_{HB}$$ in Eq \ref{eqn:procs} is the effect of hydrogen bonding to the amide H ($$\Delta\sigma^i_{1^\circ HB}$$) and O ($$\Delta\sigma^i_{2^\circ HB}$$) atoms of residue $$i$$ on the chemical shielding of the backbone atoms (this term is zero for C$$\beta$$) $\label{eqn:sigmahb} \Delta \sigma^i_{HB}=\Delta\sigma^i_{1^\circ HB}(r_{\mathrm{HO}},\theta,\rho)+\Delta\sigma^i_{2^\circ HB}(r_{\mathrm{OH}},\theta_{\mathrm{O}},\rho_{\mathrm{O}})$ $$\Delta\sigma^i_{1^\circ HB}$$ is computed using the structural models shown in Figure \ref{fig:HB} as the change in chemical shielding of the backbone atoms in N-methyl acetamide relative to that of the free monomer computed at the OPBE/6-31G(d,p)//PM6 level of theory for a variety of orientations (see subsection \ref{subsec:HBscan} for more information) while the internal monomer geometries are kept fixed. For H$$\alpha$$ the chemical shielding is taken as the average of the three hydrogen atom on the N-methyl group. Note that the carbonyl carbon formally belongs to residue $$i-1$$. $$\Delta\sigma^i_{2^\circ HB}$$ is included only when another amide or amine group is hydrogen bonded to the amide oxygen and is computed as the change in the chemical shielding of the top amide group in Figure \ref{fig:HB}a. For H$$\alpha$$ the chemical shielding is taken as the average of the three hydrogen atoms on the methyl group of the acetamide. Note that in this case the amide nitrogen and hydrogen formally belong to residue $$i+1$$ and that $$r_\mathrm{HO}$$, $$\theta$$, and $$\rho$$ are defined relative to the carbonyl oxygen of residue $$i$$ rather than the amide proton as for $$\Delta\sigma^i_{1^\circ HB}$$. $$r_\mathrm{HO}$$, $$\theta$$, and $$\rho$$ are therefore labeled $$r_\mathrm{OH}$$, $$\theta_{\mathrm{O}}$$, and $$\rho_{\mathrm{O}}$$ in Eq \ref{eqn:sigmahb}.

\label{fig:HB} Schematic representation of the model systems used to compute $$\Delta \sigma^i_{HB}$$.

$$\Delta \sigma^i_{H\alpha B}$$ is the effect of hydrogen bonding to the H$$\alpha$$ and amide O atoms of residue $$i$$ on the chemical shielding of the backbone atoms and C$$\beta$$ and has two contributions $\label{eqn:sigmahab} \Delta \sigma^i_{H\alpha B}=\Delta\sigma^i_{1^\circ H\alpha B}(r_{\mathrm{H\alpha O}},\theta,\rho)+\Delta\sigma^i_{2^\circ H\alpha B}(r_{\mathrm{OH\alpha}},\theta_{\mathrm{O}},\rho_{\mathrm{O}})$ $$\Delta\sigma^i_{1^\circ H \alpha B}$$ is computed using the structural models shown in Figure \ref{fig:HAB} as the change in chemical shielding of the backbone and C$$\beta$$ atoms in Ac-A-NMe relative to that of the free monomer computed at the OPBE/6-31G(d,p)//PM6 level of theory for a variety of orientations (see subsection \ref{subsec:HBscan} for more information) while the internal monomer geometries are kept fixed. $$\Delta\sigma^i_{2^\circ HB}$$ is computed as the change in the chemical shielding of the top amide group in Figure \ref{fig:HAB}a. For H$$\alpha$$ the chemical shielding is taken as the average of the three hydrogen atom on the methyl group of the acetamide. Note in this case that the amide nitrogen and hydrogen formally belong to residue $$i+1$$ and that $$r_\mathrm{HO}$$, $$\theta$$, and $$\rho$$ are defined relative to the carbonyl oxygen of residue $$i$$ rather than the amide proton as for $$\Delta\sigma^i_{1^\circ HB}$$. $$r_\mathrm{H\alpha O}$$, $$\theta$$, and $$\rho$$ are therefore labeled $$r_\mathrm{OH\alpha}$$, $$\theta_{\mathrm{O}}$$, and $$\rho_{\mathrm{O}}$$ in Eq \ref{eqn:sigmahab}.

\label{fig:HAB} Schematic representation of the model systems used to compute $$\Delta \sigma^i_{H\alpha B}$$.

$$\Delta \sigma^i_{RC}$$ is the effect of ring current on the chemical shielding. Usually this is only significant for proton shift and is thus only calculated for the H$$\alpha$$ and amide protons. The ring current is calculated by a simple point-dipole model equation $\Delta \sigma^i_{RC}=iB\frac{1-3 \cos^2(\theta)}{r^3}$ The model depends on the parameters $$i$$, which is the side-chain-specific ring-current intensity relative to benzene, $$B$$, which is a constant in the model, and the vector $$\mathbf{r}$$, which is the vector from the proton to the center of the aromatic ring. $$\theta$$ is the angle between $$\mathbf{r}$$ and the vector normal to the aromatic ring system. The cut-off for calculating ring current is 8 Å in Procs15 and the value for $$i$$ and $$B$$ are taken from (Christensen 2011).

$$\Delta\sigma^{i}_{w}$$ is the change in chemical shielding of an amide proton due to a hydrogen bond to a water molecule. While the backbone terms of ProCS15 is parameterized based on DFT calculations with the polarizable continuum model of solvation, this model does not account for explicit solvent effects and this term is included for amide protons that do not form hydrogen bonds to other atoms in the protein structure. $$\Delta\sigma^{i}_{w}$$ is 2.07 ppm based on DFT calculations on an N-methylacetamide-water complex (Christensen 2013).

# Methodology

## Backbone scans

\label{subsec:bbscan} The capped AXA tripeptides used to compute the first three terms of Eq \ref{eqn:procs} were constructed using the FragBuilder Python module (Christensen 2014), which was also used to make different conformations. The acidic and basic amino acids are all modeled in their charged state, including Histidine. This will be the correct charged state for most ionizable residues in most proteins. However, for any ionizable residues that are in their neutral state this approximation can introduce large errors. For example, the C$$\beta$$ chemical shifts of Asp and His change by 3.0 and 2.4 ppm due to protonation state changes in small peptides, while the N-chemical shifts change by 1.5 and 1.8 ppm (Platzer 2014). This issue will be addressed in future studies. Only Cysteine is modeled and not the disulfide bonded Cysteine. For each tripeptide a scan on the central residue’s backbone and side chain dihedral angles $$\phi$$, $$\psi$$, $$\chi_{1}$$, $$\chi_{2}$$, $$\chi_{3}$$, $$\chi_{4}$$ was carried out. The $$\omega$$ dihedral angle was fixed at 180$$^\circ$$. The $$\phi$$/$$\psi$$ backbone angles on the N and C-termini alanine residues were fixed at -140$$^\circ$$ and 120$$^\circ$$ corresponding to typical $$\beta$$-sheet residue backbone angles. The scans were done with a 20$$^\circ$$ grid spacing. For the alanine AAA tripeptide this resulted in 361 conformations from a $$\phi$$/$$\psi$$ scan. For amino acid types with more than two side chain angles this approach would result in far to many samples. Instead we used BASILISK (Harder 2010) that allows us to sample from the continuous space of the side chain torsion degrees of freedom. 1000 conformations were generated for each $$\phi$$/$$\psi$$ backbone pair spaced by 20$$^\circ$$. See Table S1 in Supplementary Materials for an overview of the number of conformations sampled for each residue. The geometry of each conformation were optimized with PM6 (Stewart 2007) with the backbone and side chain torsion angles frozen. The GIAO NMR calculations were done at the OPBE/6-31G(d,p) level of theory (Zhang 2006) using the CPCM continuum solvation model (Barone 1998) with a dielectric constant of 78. The rationale for using 78 is that the bulk solvent effects will have the largest effect for charged side-chains, which are usually located on the surface of the protein. Both the optimization and NMR calculation were done with Gaussian 09 program(Frisch 2014). In total the ProCS15 backbone terms are based on $$\sim$$2.35 million DFT calculations.

Several structures failed in the optimization stage or had to be discarded due to steric clashes in the NMR calculation and the missing chemical shielding values were found by interpolation. For amino acids with no and one side chain angles cubic interpolation was used and for 2-4 side chain angles nearest neighbor interpolation. For amino acids with 0 side chain angles, the data is interpolated to a grid with 1$$^\circ$$ grid spacing, 1 side chain angles to a grid of 5$$^\circ$$ and the rest of the amino acids 20$$^\circ$$. The interpolation is done with the Python package SciPy (Jones 2001--). The grids are saved in the .npy compressed file format from the Numpy Python package. In the compressed state on the hard disk the data size is $$\sim$$17 GB and when loaded in to random access memory (RAM) $$\sim$$25 GB.

## Hydrogen bond scans

\label{subsec:HBscan} $$\Delta \sigma_{HB}$$ and $$\Delta \sigma_{H\alpha B}$$ (cf. Eq \ref{eqn:procs}) are parameterized using the model systems shown in Figures \ref{fig:HB} and \ref{fig:HAB}. For $$\Delta \sigma_{HB}$$ the scans were done by scanning over the hydrogen bond length $$r_{OH}$$, the bond angle $$\theta_{\mathrm{H}}$$ defined by H..O=C or H..O-C and the dihedral angle $$\rho_{\mathrm{H}}$$ defined by H..O=C-N, H..O=C-C or H..O-C(..)H$$^\text{O}$$. The bond length was scanned from 1.5 to 3.0 Å in 0.125 Å steps. $$\theta_{\mathrm{H}}$$ was scanned from 180.0 to 90.0$$^\circ$$ in 10.0$$^\circ$$ steps and $$\rho_{\mathrm{H}}$$ was done in the entire range -180$$^\circ$$ to 180$$^\circ$$. Similarly, for $$\Delta \sigma_{H\alpha B}$$ the $$r_{OH\alpha}$$ bond length was scanned from 1.8 to 4.0 Å in steps of 0.2 Å. The bond angle $$\theta_{\mathrm{H}\alpha}$$ defined by H$$\alpha$$..O=C or H$$\alpha$$..O-C was scanned from 180$$^\circ$$ to 90$$^\circ$$ with a 10$$^\circ$$ step size. The $$\rho_{\mathrm{H}\alpha}$$ dihedral H$$\alpha$$..O=C-N, H$$\alpha$$..O=C-C and H$$\alpha$$..O-C(..)H$$^\text{O}$$ was scanned in steps of 15$$^\circ$$ over the entire range. To get the change in chemical shift caused by the hydrogen bonding the OPBE/6-31G(d,p)//PM6 chemical shielding of systems without hydrogen bonding are subtracted from the scans. The result of the scan is interpolated and saved in another set of .npy files. The monomer geometries are optimized at the PM6 level of theory and kept fixed during the scan.

## NMR calculations and protein structures used

In this paper we benchmark the NMR chemical shift predictions on Ubiquitin and GB3. The structures are geometry optimized using PM6-D3H+ (Kromann 2014) using the PCM solvation model (Tomasi 2005, Steinmann 2013) and the CHARMM22/CMAP force field (Mackerell 2004) using the GB/SA solvation model (Qiu 1997) with the 1UBQ (Vijay-Kumar 1987) and 2OED (Ulmer 2003) structures as starting points. The PM6-D3H+ optimizations are done using the GAMESS program (Schmidt 1993) with a convergence criterion of 5 $$\times$$ 10$$^{-4}$$ atomic units, while the CHARMM22/CMAP optimizations are done using TINKER (Ponder 1987) with the default convergence criterion of 0.01 kcal/mole/Å. In addition the following NMR-derived structural ensembles are used without further refinement: 1D3Z (Cornilescu 1998), 2K39 (Lange 2008), 1XQQ (Lindorff-Larsen 2005), 2LJ5 (Montalvao 2012), 2KOX (Fenwick 2011). In all calculation we used charged protonation states for the acidic and basic side-chains, but in the NMR ensembles Histidine were left neutral (with either ND1 or NE2 protonated) as published. The charges are consistent with the published pK$$_a$$ values of Ubiquitin (Sundd 2002, Lenkinski 1977) and GB3 (Khare 1997).

OBPE/6-31G(d,p)//PM6-D3H+ GIAO NMR shielding calculations were performed with Gaussian09 using the CPCM solvation model. ProCS15 calculations were done using a module written for the protein simulation framework PHAISTOS (Boomsma 2013). The module was specifically written for this paper, and can be downloaded at github.com/jensengroup/procs15. CheShift-2 calculations were performed using either the web interface at cheshift.com or the CheShift-2 PyMOL-plugin (Schrödinger, LLC 2010) found at github.com/aloctavodia/cheshift. CamShift, PPM_One, Sparta+, shAIC, and ShiftX2 calculations are performed using the stand-alone predictors. The NMR chemical shielding and shifts are compared to shifts measured for Ubiquitin (Cornilescu 1998) (BMRB ID 17769)(Ulrich 2007) and GB3 (Vögeli 2012) (BMRB ID 18531), respectively, both at pH 6.5.

Much of the variation in some of the chemical shifts comes from the nature of the side-chain itself and the side chains before and after in the sequence, which can lead to inflated $$r$$-values. To separate the contributions of the sequence and the structure we subtract the measured sequence corrected random coil values (Tamiola 2010) from all predicted and experimental values. Note that this does not affect the computed RMSD values.

# Results and Discussion

## Choice of functional and basis set

When it comes to prediction of chemical shifts in proteins the most widely used functional appears to be B3LYP (Becke 1993). For example, Zhu, He and Zhang (Zhu 2012) used B3LYP/6-31G(d,p) to compute hydrogen and carbon chemical shifts for small proteins that correlate well with experimental measurements with $$r$$ values typically ≥ 0.98 when solvent effects are taken into account. Exner, Möller, and co-workers (Exner 2012) have obtained similar results using B3LYP/6-31G(d) and even observed a correlation of 0.81 for the notoriously difficult amide N by averaging over several snapshots. Finally, Vila, Baldoni and Scheraga (Vila 2009) did a systematic study of the effect of 10 functionals on C$$\alpha$$ chemical shifts in Ubiquitin and found very little difference in performance with all $$r$$ and RMSD values in the range 0.902 – 0.908 and 2.12 – 2.30 ppm. Interestingly, this study included functionals such as OPBE that are computationally less demanding than B3LYP. Vila, Scheraga and co-workers (Vila 2009a) subsequently observed that C$$\alpha$$ chemical shifts computed using smaller basis sets such as 6-31G correlate extremely well the chemical shifts computed using lager basis set such as 6-311+G(2d,p). We therefore decided to use the 6-31G(d,p) basis for our calculations and use the computationally efficient OPBE functional.

## Benchmarking ProCS15 against full QM calculations

Eq \ref{eqn:procs} is parameterized using OPBE/6-31G(d,p)//PM6 calculations so we compare ProCS15 against full OPBE/6-31G(d,p)//PM6-D3H+ calculations on Ubiquitin (1UBQ) and GB3 (2OED) to test for errors introduced by the inherent additivity assumptions and the structural simplifications in the model systems used for the DFT calculations. We use PM6-D3H+ for the geometry optimization, rather than PM6, to get a better description of hydrogen-bonding and other intermolecular interactions. However, bond lengths and angles, and their effect on chemical shifts, will be very virtually identical to PM6. The results are summarized in Table \ref{table:vsqm}. The first row, marked “all”, summarizes results for ProCS15 if all but the last term of Eq \ref{eqn:procs} are included. The last term corrects for the explicit solvent effects and thus not relevant when comparing to DFT calculations.

In the case of C$$\alpha$$ none of the terms have a large effect on the chemical shielding. In the case of GB3 the results improve slightly if $$\Delta \sigma_{1^\circ HB}^i$$ is removed and removing $$\Delta \sigma_{1^\circ H\alpha B}^i$$ improves the results slightly for both proteins. Accordingly these two terms are removed from ProCS15, while all other terms are kept (note the ring current is only included for hydrogen atoms). For C$$\beta$$ removing $$\Delta \sigma_{1^\circ H\alpha B}^i$$ decreases the RMSD by 0.2 - 0.5 ppm, while $$\Delta \sigma_{BB}^{i-1}$$ and $$\Delta \sigma_{BB}^{i+1}$$ increases and decreases the RMSD value depending on the protein. Accordingly only $$\Delta \sigma_{1^\circ H\alpha B}^i$$ is removed. Note that the structural models used for $$\Delta \sigma_{1^\circ HB}^i$$, $$\Delta \sigma_{2^\circ HB}^i$$ and $$\Delta \sigma_{2^\circ H\alpha B}^i$$ do not contain a C$$\beta$$ atom so there is no such contribution for this nucleus. For C’ removing $$\Delta \sigma_{1^\circ H B}^i$$ decreases the RMSD for GB3 by 0.1 ppm so we choose to remove this term for this atom type. Note that removing $$\Delta \sigma_{2^\circ HB}^i$$ increases the RMSD by 0.4 - 0.6 ppm so this term is important for accurate predictions of C’ chemical shifts. For H$$^\text{N}$$ and H$$\alpha$$ we choose to retain all the terms. Not surprisingly, the respective primary hydrogen bonding terms lower the RMSD by 0.4 - 0.6 ppm and are crucial for accurate predictions. Finally, for N removing $$\Delta \sigma_{1^\circ H\alpha B}^i$$ lowers the RMSD by 0.2-0.5 ppm, so this term is removed. Note that $$\Delta \sigma_{BB}^{i-1}$$ and the two hydrogen bonding terms involving H lower the RMSD by as much as 1.6 ppm ($$\Delta \sigma_{BB}^{i-1}$$ for Ubiquitin) and is crucial for accurate predictions.

\label{table:vsqm}

Comparison of ProCS15 to OPBE/6-31G(d,p)//PM6-D3H+ values computed for the entire protein. All chemical shielding values are corrected for random coil effects. The RMSD values are computed after linear regression. “All” means that all terms in Eq \ref{eqn:procs} are included, with the exception of $$\Delta \sigma_w$$. “$$\Delta \sigma_{BB}^{i-1}$$” means that the $$\Delta \sigma_{BB}^{i-1}$$ term has been removed in the chemical shift prediction, while all other terms are included. The row marked “ProCS15” corresponds to the combination of terms outlined in Table \ref{table:terms}.
C$$\alpha$$ C$$\beta$$ C’ H$$\alpha$$ H$$^\text{N}$$ N
RMSD (r) RMSD (r) RMSD (r) RMSD (r) RMSD (r) RMSD (r)
ubiquitin
all 1.9 (0.70) 3.0 (0.50) 2.1 (0.72) 0.6 (0.82) 0.7 (0.85) 4.9 (0.67)
$$\Delta \sigma_{BB}^{i-1}$$ 1.9 (0.69) 3.1 (0.48) 2.1 (0.72) 0.6 (0.81) 0.6 (0.88) 6.5 (0.50)
$$\Delta \sigma_{BB}^{i+1}$$ 1.9 (0.71) 3.1 (0.48) 2.1 (0.73) 0.6 (0.82) 0.7 (0.85) 5.0 (0.66)
$$\Delta \sigma_{1^\circ HB}^i$$ 1.9 (0.72) - 2.1 (0.72) 0.6 (0.82) 1.3 (0.20) 4.7 (0.70)
$$\Delta \sigma_{2^\circ HB}^i$$ 1.9 (0.69) - 2.7 (0.53) 0.6 (0.80) 0.8 (0.83) 5.9 (0.50)
$$\Delta \sigma_{1^\circ H\alpha B}^i$$ 1.7 (0.75) 2.5 (0.69) 2.1 (0.72) 1.0 (0.42) 0.7 (0.86) 4.4 (0.74)
$$\Delta \sigma_{2^\circ H\alpha B}^i$$ 1.9 (0.69) - 2.2 (0.71) 0.6 (0.82) 0.7 (0.85) 5.0 (0.66)
$$\Delta \sigma_{RC}^i$$ - - - 0.6 (0.81) 0.7 (0.85) -
ProCS15 1.7 (0.77) 2.5 (0.69) 2.1 (0.72) 0.6 (0.82) 0.7 (0.85) 4.4 (0.74)
GB3
all 1.8 (0.81) 2.5 (0.58) 2.4 (0.60) 0.7 (0.82) 0.8 (0.82) 4.7 (0.77)
$$\Delta \sigma_{BB}^{i-1}$$ 1.7 (0.82) 2.4 (0.59) 2.5 (0.52) 0.7 (0.83) 0.9 (0.79) 5.9 (0.61)
$$\Delta \sigma_{BB}^{i+1}$$ 1.8 (0.81) 2.4 (0.59) 2.5 (0.55) 0.6 (0.84) 0.8 (0.82) 4.7 (0.77)
$$\Delta \sigma_{1^\circ HB}^i$$ 1.7 (0.84) - 2.3 (0.63) 0.7 (0.83) 1.4 (0.82) 5.6 (0.69)
$$\Delta \sigma_{2^\circ HB}^i$$ 1.8 (0.80) - 2.8 (0.49) 0.7 (0.81) 0.8 (0.82) 5.6 (0.67)
$$\Delta \sigma_{1^\circ H\alpha B}^i$$ 1.7 (0.82) 2.3 (0.60) 2.4 (0.62) 1.1 (0.36) 0.8 (0.82) 4.5 (0.78)
$$\Delta \sigma_{2^\circ H\alpha B}^i$$ 1.8 (0.81) - 2.4 (0.60) 0.7 (0.83) 0.8 (0.82) 4.6 (0.77)
$$\Delta \sigma_{RC}^i$$ - - - 0.7 (0.79) 0.8 (0.80) -
ProCS15 1.6 (0.84) 2.3 (0.60) 2.3 (0.65) 0.7 (0.82) 0.8 (0.82) 4.5 (0.78)

An overview of the terms of Eq \ref{eqn:procs} used in ProCS15 for each atom type can be found Table \ref{table:terms} and the RMSD and $$r$$ values obtained using this combination of terms are given in the row labeled “ProCS15” in Table \ref{table:vsqm}. The RMSD value for the carbon atoms range from 1.6 to 2.5 ppm and a very similar for both proteins. The $$r$$ values range between 0.60 and 0.84 with the $$r$$ value being consistently highest for C$$\alpha$$. For the hydrogen atoms the RMSD and $$r$$ values range from 0.6 to 0.8 ppm and 0.82 to 0.85, respectively. Finally, for N the RMSD values are 4.3 - 4.5 ppm, while the $$r$$ values are in the range 0.74 - 0.78.

In the case of GB3 the RMSD ($$r$$) value for C$$\beta$$ can be reduced (increased) to 1.8 ppm (0.71) by removing a single outlier identified by the Generalized Extreme Studentized Deviate Test (Rosner 1983). The outlier is Ala20 for which ProCS15 and DFT predict a C$$\beta$$ chemical shielding value of 176.8 and 167.4 ppm, respectively. Inspection of the structure shows that the C$$\beta$$ atom is only 3.1 Å from the N atom of Ala26 - an interaction not taken into account in the parameterization of ProCS15.

Similarly (also for GB3), the RMSD ($$r$$) value for H$$^\text{N}$$ can be reduced (increased) to 0.6 ppm (0.91) by removing a single outlier identified by the Generalized Extreme Studentized Deviate Test. The outlier is Gln2 for which ProCS15 and DFT predict a H$$^\text{N}$$ chemical shielding value of 24.2 and 20.1 ppm, respectively. Inspection of the structure shows that the H$$^\text{N}$$ atom is within 1.77 Å of the OE1 atom of the Gln2 side chain and within 2.54 Å of an H$$\varepsilon$$ atom of the Met1 side chain. While these interactions should be included in the $$\sigma_{BB}^{i}$$ and $$\Delta \sigma_{BB}^{i-1}$$ term, respectively, it i possible that the latter interaction is not found in the scan due to the choice of $$\phi_{\text{std}}$$ and $$\psi_{\text{std}}$$ described above. This residue is also identified as an outlier for N and removing it reduces (increases) the RMSD ($$r$$) value to 4.1 ppm (0.81).

\label{table:terms}

Terms in Eq \ref{eqn:procs} that are included in ProCS15 for a given atom type are marked with an “x”
C$$\alpha$$ C$$\beta$$ C’ H$$\alpha$$ H$$^\text{N}$$ N
$$\Delta \sigma_{BB}^{i-1}$$ x x x x x x
$$\Delta \sigma_{BB}^{i+1}$$ x x x x x x
$$\Delta \sigma_{1^\circ HB}^i$$ x x x
$$\Delta \sigma_{2^\circ HB}^i$$ x x x x x
$$\Delta \sigma_{1^\circ H\alpha B}^i$$ x x
$$\Delta \sigma_{2^\circ H\alpha B}^i$$ x x x x x
$$\Delta \sigma_{RC}$$ x x
$$\Delta \sigma_{w}$$ x

## Comparison to experimental chemical shifts using single structures

Table \ref{table:vsexp} shows the comparison of QM, ProCS15 and several common chemical shift predictors to experimental values. The first two rows use the OPBE/6-31G(d,p) and ProCS15 chemical shielding predictions used to construct Table \ref{table:vsqm} and therefore use the PM6-D3H+ optimized structures of Ubiquitin and GB3. However, most future use of ProCS15 will be based on structures optimized with force fields so prediction of the remaining rows is done using structures optimized with the CHARMM22/CMAP force field. The ProCS15 predictions based on the CHARMM22/CMAP-optimized structures include the $$\Delta \sigma_{w}$$ term (cf. Eq \ref{eqn:procs}). The $$a$$ and $$b$$ factors in Eq \ref{eqn:scaling} are found by linear regression to the experimental values for each atom type. In order to offer a fair comparison RMSD values are computed after a linear fit to the experiment for all methods.

The OPBE/6-31G(d,p)//PM6-D3H+ calculations reproduce the experimental chemical shifts to within 2.8 ppm for carbon atoms, 0.6 ppm for hydrogen atoms and 4.6 ppm for nitrogen. The results are similar to those observed by other researchers using other functionals. For example, Zhu and co-workers (Zhu 2012) used B3LYP3/6-31G(d,p)//AMBER (and a locally dense 6-31++G(d,p)/4-31G(d) basis set for C’) and an implicit solvent model to reproduce chemical shift values to within 3.3 ppm for carbon atoms, 0.4 for hydrogen atoms and 8.4 ppm for nitrogen. In this study the RMSD for hydrogen atoms was computed for H$$\alpha$$ and H$$^\text{N}$$ combined. In a later study (Zhu 2013), the same researchers reproduced the chemical shifts of amide protons in GB3 to within 0.5 ppm using a locally dense 6-31++G(d,p)/4-31G(d) basis set and a variety of functionals including OPBE. Similarly, Exner and co-workers (Exner 2012) used B3LYP/6-31G(d)//AMBER and an implicit solvent model to reproduce the H$$^\text{N}$$ chemical shifts of the HA2 Domain to within 0.5 ppm using a single structure and 0.3 pm using several MD snapshots.

While ProCS15 does not reproduce the DFT results perfectly as discussed above the first two rows of Table \ref{table:vsexp} show that ProCS15 can reproduce experimental chemical shifts with an overall accuracy that is similar to full DFT chemical shielding calculations for Ubiquitin and GB3. The RMSD values predicted with ProCS15 for carbon atoms are 0.1 - 0.6 ppm lower compared to the DFT results, while the RMSD values for hydrogen and nitrogen atoms are 0.0 - 0.1 ppm and 0.2 - 0.4 ppm higher. It is therefore not clear that much is necessarily gained by adding additional terms to ProCS15 without also increasing the underlying level of theory used to compute these terms. For example, it is known that using a larger basis set can significantly improve the prediction of C’ chemical shifts (Vila 2014, Zhu 2012).

Using structures optimized with CHARMM22/CMAP instead of PM6-D3H+ to predict chemical shifts with ProCS15 does also not seem to lead to overall worse agreement with experiment. In fact the results tend to improve slightly (up to 0.5 ppm) for heavy atoms as judged by the RMSD values. Comparison of ProCS15 to CheShift-2, which has also been parameterized against DFT calculations, show fairly similar accuracy for C$$\alpha$$ and slightly worse accuracy for C$$\beta$$. The latter observation is perhaps due to the fact that CheShift-2 uses a different (empirical-corrected) reference for each residue type. However, this is also the case for C$$\alpha$$ for which ProCS15 predictions give a lower RMSD value.

Comparison of ProCS15 to the empirical methods (CamShift through ShiftX2) generally show considerably lower RMSD of the empirical predictions for all atoms types, except H$$\alpha$$ for GB3 where the accuracy is mostly comparable. The $$r$$ values are also considerably higher for the empirical methods than for ProCS15 for C$$\alpha$$ and, especially, C$$\beta$$, while they are comparable for the remaining atom atoms.

As mentioned in the introduction the higher RMSD values generally observed for the DFT-based methods compared to the empirical methods is expected. The important issue in the context of structural refinement against measured chemical shifts is whether the DFT-based methods are more sensitive to relative small differences in structure. While a thorough investigation of this complex issue for ProCS15 will be the subject of future studies, we look at the effect of using different structural ensembles on the accuracy next.

\label{table:vsexp}

Comparison of chemical shifts predicted using various methods to experimental values measured for Ubiquitin and GB3 and corrected for random coil effects. The RMSD values are computed after linear regression. The predictions were done using CHARMM22/CMAP optimized structures using the GB/SA solvation model except for the first two rows (marked with $$^a$$) where PM6-D3H+ optimized structures using the CPCM solvation model were used.
C$$\alpha$$ C$$\beta$$ C’ H$$\alpha$$ H$$^\text{N}$$ N
RMSD (r) RMSD (r) RMSD (r) RMSD (r) RMSD (r) RMSD (r)
Ubiquitin
DFT$$^a$$ 2.1 (0.62) 2.8 (0.56) 1.8 (0.85) 0.4 (0.83) 0.6 (0.81) 4.0 (0.80)
ProCS15$$^a$$ 2.0 (0.61) 2.2 (0.52) 1.7 (0.88) 0.4 (0.86) 0.6 (0.73) 4.4 (0.85)
ProCS15 1.7 (0.70) 2.0 (0.50) 1.7 (0.81) 0.4 (0.77) 0.6 (0.72) 4.0 (0.79)
CheShift-2 1.7 (0.59) 1.6 (0.62)
CamShift 1.1 (0.85) 1.3 (0.71) 1.0 (0.81) 0.3 (0.73) 0.5 (0.69) 3.0 (0.63)
PPM_One 0.7 (0.93) 1.1 (0.80) 0.9 (0.87) 0.2 (0.88) 0.4 (0.73) 2.2 (0.81)
Sparta+ 0.7 (0.93) 1.1 (0.82) 0.8 (0.88) 0.2 (0.86) 0.4 (0.72) 2.2 (0.81)
shAIC 0.7 (0.94) 1.1 (0.82) 0.8 (0.89) 0.3 (0.83) 0.5 (0.71) 2.3 (0.79)
ShiftX2 0.5 (0.97) 0.7 (0.91) 0.5 (0.96) 0.1 (0.97) 0.3 (0.91) 1.8 (0.88)
GB3
DFT$$^a$$ 2.1 (0.71) 2.4 (0.53) 0.4 (0.76) 0.6 (0.86) 4.6 (0.78)
ProCS15$$^a$$ 1.8 (0.73) 2.1 (0.42) 0.4 (0.75) 0.7 (0.85) 4.8 (0.88)
ProCS15 1.6 (0.70) 2.0 (0.42) 0.3 (0.85) 0.6 (0.76) 4.3 (0.86)
CheShift-2 1.7 (0.68) 1.8 (0.53)
Camshift 1.2 (0.81) 1.0 (0.83) 0.3 (0.85) 0.4 (0.82) 3.3 (0.54)
PPM_One 1.0 (0.87) 0.9 (0.87) 0.3 (0.91) 0.4 (0.89) 2.3 (0.79)
Sparta+ 1.0 (0.87) 1.0 (0.86) 0.3 (0.89) 0.4 (0.88) 2.8 (0.70)
shAIC 1.0 (0.88) 1.0 (0.85) 0.3 (0.87) 0.4 (0.83) 2.3 (0.79)
ShiftX2 0.6 (0.96) 0.7 (0.93) 0.1 (0.97) 0.1 (0.98) 2.3 (0.79)

## Comparison to experimental chemical shifts using NMR-derived ensembles

Table \ref{table:ensembles} lists the RMSD and $$r$$ values computed for Ubiquitin using the x-ray structure 1UBQ and five NMR-derived structural ensembles with between 10 and 640 structures. For ProCS15 the average chemical shift is obtained by computing the average chemical shielding for each nucleus followed by the linear regression fit to experimental chemical shift values (cf. Eq \ref{eqn:scaling}) to obtain the predicted average chemical shifts. The procedure is the same for the remaining methods except that chemical shifts are used instead of chemical shieldings.

For ProCS15 use of ensemble structures lowers the RMSD values for all atom types, with decreases in the range 0.1 - 0.7 ppm for heavy atoms and 0.1 ppm hydrogen atoms. Similar improvements are observed for C$$\alpha$$ and C$$\beta$$ for CheShift-2, except that the improvement in RMSD for C$$\beta$$ (0.5 ppm) is larger compared to ProCS15 (0.3 ppm). These improvements are expected if the NMR-derived ensembles are a more accurate representation of the protein structure in solution than the single x-ray structure (Arnautova 2009, Vila 2010).

Improvements are also observed for CamShift, with RMSD-decreases of 0.3 - 1.7 and 0.2 ppm for heavy and hydrogen atoms, respectively. In the case of PPM_One, Sparta+, and shAIC modest (up to 0.3 ppm) RMSD-decreases are observed for some ensembles but not others and, on average, the RMSD is roughly equally likely to remain unchanged or increase slightly. Finally, for ShiftX2 the RMSD consistently increases (by up to 0.7 ppm) on going from the x-ray structure to the ensembles, with the exception of C$$\alpha$$ where the RMSD is lowered by 0.1 ppm. We note that the RMSD values predicted with CamShift using the crystal structure are significantly larger than when using the CHARMM/CMAP structure (presumably due to hydrogen being optimized placed in accordance to the CHARMM22 topology file in the CamShift training set) and that the reduction in RMSD on going to ensembles is at most 0.3 ppm relatively to these values. So, it appears that the use of ensemble structures does not lead to a significant increase in accuracy compared to using a single structure for any of the empirical methods, in contrast to ProCS15 and CheShift-2.

The observations are consistent with earlier observations (Parker 2006, Sumowski 2014, Vila 2009, Christensen 2013) that the empirical NMR prediction methods tend to be significantly less sensitive to changes in protein structure compared to DFT-based chemical shift predictors or chemical shifts computed using QM methods.

\label{table:ensembles}

Comparison of chemical shifts predicted using various methods to experimental values measured for ubiquitin corrected for random coil effects. The RMSD values are computed after linear regression. The predictions are done using a single x-ray structure (1UBQ) and five NMR-derived ensembles of varying size (indicated in parentheses for 1UBQ) without further refinement of the structure.
C$$\alpha$$ C$$\beta$$ C’ H$$\alpha$$ H$$^\text{N}$$ N
RMSD (r) RMSD (r) RMSD (r) RMSD (r) RMSD (r) RMSD (r)
ProCS15
1UBQ (1) 1.7 (0.74) 2.0 (0.50) 1.7 (0.85) 0.3 (0.80) 0.6 (0.94) 3.7 (0.80)
1D3Z (10) 1.3 (0.81) 1.7 (0.62) 1.7 (0.76) 0.3 (0.81) 0.5 (0.66) 3.2 (0.83)
2K39 (116) 1.1 (0.84) 1.7 (0.52) 1.7 (0.69) 0.3 (0.81) 0.5 (0.61) 3.6 (0.69)
1XQQ (128) 1.1 (0.84) 1.8 (0.49) 1.6 (0.74) 0.3 (0.82) 0.5 (0.61) 3.7 (0.73)
2LJ5 (301) 1.1 (0.86) 1.7 (0.55) 1.6 (0.69) 0.3 (0.82) 0.6 (0.58) 3.6 (0.74)
2KOX (640) 1.0 (0.89) 1.7 (0.56) 1.6 (0.71) 0.2 (0.86) 0.5 (0.65) 3.5 (0.78)
CheShift-2
1UBQ 1.9 (0.58) 1.9 (0.47)
1D3Z 1.3 (0.76) 1.3 (0.70)
2K39 1.3 (0.80) 1.5 (0.62)
1XQQ 1.3 (0.81) 1.6 (0.56)
2LJ5 1.2 (0.82) 1.4 (0.65)
2KOX 1.2 (0.83) 1.4 (0.66)
CamShift
1UBQ 1.7 (0.75) 1.9 (0.58) 1.2 (0.74) 0.3 (0.71) 0.6 (0.52) 4.5 (0.54)
1D3Z 1.0 (0.87) 1.2 (0.75) 0.9 (0.85) 0.3 (0.80) 0.5 (0.70) 2.7 (0.72)
2K39 1.1 (0.84) 1.2 (0.80) 1.0 (0.83) 0.2 (0.87) 0.4 (0.73) 2.9 (0.65)
1XQQ 1.1 (0.84) 1.2 (0.77) 0.9 (0.85) 0.2 (0.87) 0.5 (0.68) 2.9 (0.64)
2LJ5 1.0 (0.86) 1.4 (0.68) 0.9 (0.85) 0.2 (0.87) 0.5 (0.71) 3.1 (0.59)
2KOX 1.0 (0.88) 1.1 (0.78) 0.9 (0.85) 0.2 (0.85) 0.4 (0.73) 2.8 (0.67)
PPM_One
1UBQ 0.7 (0.94) 1.1 (0.84) 0.9 (0.85) 0.2 (0.87) 0.6 (0.49) 2.2 (0.81)
1D3Z 0.6 (0.96) 0.9 (0.88) 0.8 (0.89) 0.2 (0.89) 0.4 (0.78) 1.8 (0.89)
2K39 0.8 (0.95) 1.0 (0.88) 0.8 (0.89) 0.2 (0.92) 0.4 (0.78) 2.2 (0.81)
1XQQ 0.8 (0.91) 1.1 (0.84) 0.8 (0.88) 0.2 (0.92) 0.4 (0.73) 2.2 (0.82)
2LJ5 0.6 (0.95) 0.9 (0.88) 0.8 (0.89) 0.2 (0.93) 0.4 (0.74) 2.1 (0.84)
2KOX 0.6 (0.96) 0.9 (0.89) 0.8 (0.89) 0.2 (0.93) 0.4 (0.78) 2.0 (0.85)
Sparta+
1UBQ 0.7 (0.94) 1.0 (0.85) 0.9 (0.86) 0.2 (0.85) 0.6 (0.48) 2.0 (0.84)
1D3Z 0.6 (0.95) 0.9 (0.87) 1.0 (0.83) 0.2 (0.86) 0.4 (0.77) 1.8 (0.88)
2K39 0.7 (0.95) 0.9 (0.88) 1.0 (0.85) 0.2 (0.89) 0.4 (0.78) 2.2 (0.83)
1XQQ 0.7 (0.93) 1.0 (0.86) 1.0 (0.84) 0.2 (0.92) 0.4 (0.72) 2.2 (0.82)
2LJ5 0.6 (0.96) 0.9 (0.88) 1.0 (0.84) 0.2 (0.91) 0.4 (0.76) 2.1 (0.84)
2KOX 0.6 (0.96) 0.9 (0.89) 1.0 (0.84) 0.2 (0.91) 0.4 (0.77) 2.0 (0.86)
shAIC
1UBQ 0.7 (0.93) 1.1 (0.83) 0.8 (0.89) 0.3 (0.82) 0.5 (0.69) 2.0 (0.84)
1D3Z 0.6 (0.95) 1.0 (0.85) 0.7 (0.91) 0.2 (0.85) 0.4 (0.77) 1.8 (0.87)
2K39 0.7 (0.94) 1.0 (0.84) 0.7 (0.92) 0.2 (0.85) 0.4 (0.78) 2.1 (0.83)
1XQQ 0.7 (0.94) 1.1 (0.80) 0.7 (0.91) 0.2 (0.85) 0.4 (0.72) 2.2 (0.82)
2LJ5 0.6 (0.95) 1.0 (0.86) 0.7 (0.91) 0.2 (0.86) 0.4 (0.75) 2.1 (0.84)
2KOX 0.7 (0.94) 1.0 (0.85) 0.7 (0.91) 0.2 (0.85) 0.4 (0.74) 2.0 (0.86)
ShiftX2
1UBQ 0.5 (0.97) 0.4 (0.97) 0.4 (0.97) 0.1 (0.99) 0.1 (0.98) 1.3 (0.94)
1D3Z 0.4 (0.98) 0.7 (0.94) 0.6 (0.95) 0.1 (0.96) 0.2 (0.93) 1.6 (0.91)
2K39 0.4 (0.98) 0.7 (0.93) 0.7 (0.93) 0.1 (0.98) 0.2 (0.92) 2.1 (0.85)
1XQQ 0.5 (0.97) 0.8 (0.91) 0.7 (0.93) 0.1 (0.99) 0.3 (0.90) 2.0 (0.86)
2LJ5 0.4 (0.98) 0.6 (0.95) 0.7 (0.94) 0.1 (0.98) 0.3 (0.92) 1.9 (0.87)
2KOX 0.4 (0.98) 0.6 (0.95) 0.7 (0.93) 0.1 (0.98) 0.2 (0.92) 1.8 (0.88)

# Summary and Outlook

In this paper we present ProCS15: a program that computes the isotropic chemical shielding values of backbone atoms and C$$\beta$$ given a protein structure in less than a second. ProCS accounts for the effect of backbone and side-chain dihedral angles of a residue and the two neighboring residues, hydrogen bonding to the backbone amide group and H$$\alpha$$ as well as ring-current effects (Christensen 2011) on the hydrogen atoms and assumes that these effects are additive. The backbone, side-chain and hydrogen bonding terms are based on ~2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding.

ProCS15 reproduces the chemical shielding values computed using PCM/OPBE/6-31G(d,p)//PM6-D3H+ for Ubiquitin and GB3 with RMSD values (after linear regression) of up to 2.5 ppm for carbon atoms, 0.8 ppm for hydrogen atoms, and 4.5 ppm for nitrogen. These deviations, which presumably result from the assumption of additivity and the simplified model systems, does not appear to preclude equal or better accuracy in comparison to experiment because the accuracies of the chemical shifts computed using ProCS15 (based on linear regression of the chemical shifts, cf. Eq \ref{eqn:scaling}) are very similar to the corresponding DFT calculations using single Ubiquitin and GB3 structures. The largest RMSD values observed for carbon, hydrogen, and nitrogen are, respectively, 2.2 (2.8) ppm, 0.7 (0.6) ppm, and 4.7 (4.6) ppm for ProCS15 (PCM/OPBE/6-31G(d,p)). These accuracies are very similar to DFT-based predictions made by other researchers (e.g. (Zhu 2012),(Zhu 2013), (Exner 2012)) as well as CheShift-2 (Martin 2013), which is another DFT-based chemical shift predictor for C$$\alpha$$ and C$$\beta$$ atoms. The RMSD values computed using ProCS15 for Ubiquitin can be reduced by as much as 0.7, 0.1, and 0.5 ppm for carbon, hydrogen, and nitrogen by using NMR-derived structural ensembles. Similar increase in accuracy is also observed for CheShift-2 (for C$$\alpha$$ and C$$\beta$$) while for empirical chemical shift predictors the increase in accuracy is at most 0.3 ppm.

The latter observation is another indication that empirical chemical shift predictors are less sensitive to small structural changes, which may make them less suitable for chemical shift-guided refinement of protein structure compared to DFT-based predictors. Christensen and co-workers (Christensen 2013) have already demonstrated that this is the case for amide hydrogen bonding geometries using a previous incarnation of ProCS limited to amide proton chemical shift predictions and we are now planning similar refinement studies using all backbone atoms and C$$\beta$$ chemical shifts.

ProCS15 is freely available at github.com/jensengroup/procs15 and all structures and DFT calculations, including the full NMR shielding tensors, are available at erda.dk/public/archives/YXJjaGl2ZS1TYk40VXo=/published-archive.html.

# Acknowledgements

This work was funded by the Lundbeck Foundation and the Danish e-Infrastructure Coorporation. We thank Osvaldo Martin, Jorge Vila, and Xiao He for helpful comments.

### References

1. FA Mulder, M Filatov. NMR chemical shift data and ab initio shielding calculations: emerging tools for protein structure determination.. Chem Soc Rev 39, 578-90 (2010).

2. S. Raman, O. F. Lange, P. Rossi, M. Tyka, X. Wang, J. Aramini, G. Liu, T. A. Ramelot, A. Eletsky, T. Szyperski, M. A. Kennedy, J. Prestegard, G. T. Montelione, D. Baker. NMR Structure Determination for Larger Proteins Using Backbone-Only Data. Science 327, 1014–1018 American Association for the Advancement of Science (AAAS), 2010. Link

3. O. F. Lange, P. Rossi, N. G. Sgourakis, Y. Song, H.-W. Lee, J. M. Aramini, A. Ertekin, R. Xiao, T. B. Acton, G. T. Montelione, D. Baker. Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proceedings of the National Academy of Sciences 109, 10873–10878 Proceedings of the National Academy of Sciences, 2012. Link

4. Lars A. Bratholm, Anders S. Christensen, Thomas Hamelryck, Jan H. Jensen. Bayesian inference of protein structure from chemical shift data. PeerJ 3, e861 PeerJ, 2015. Link

5. Paul Robustelli, Kai Kohlhoff, Andrea Cavalli, Michele Vendruscolo. Using NMR Chemical Shifts as Structural Restraints in Molecular Dynamics Simulations of Proteins. Structure 18, 923–933 Elsevier BV, 2010. Link

6. KJ Kohlhoff, P Robustelli, A Cavalli, X Salvatella, M Vendruscolo. Fast and accurate predictions of protein NMR chemical shifts from interatomic distances.. J Am Chem Soc 131, 13894-5 (2009).

7. Y Shen, A Bax. SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network.. J Biomol NMR 48, 13-22 (2010).

8. B Han, Y Liu, SW Ginzinger, DS Wishart. SHIFTX2: significantly improved protein chemical shift prediction.. J Biomol NMR 50, 43-57 (2011).

9. D Li, R Brüschweiler. PPM_One: a static protein structure based chemical shift predictor.. J Biomol NMR 62, 403-9 (2015).

10. JT Nielsen, HR Eghbalnia, NC Nielsen. Chemical shift prediction for protein structure calculation and quality assessment using an optimally parameterized force field.. Prog Nucl Magn Reson Spectrosc 60, 1-28 (2012).

11. Tong Zhu, Xiao He, John Z. H. Zhang. Fragment density functional theory calculation of NMR chemical shifts for proteins with implicit solvation. Phys. Chem. Chem. Phys. 14, 7837–7845 Royal Society of Chemistry (RSC), 2012. Link

12. Tong Zhu, John Z. H. Zhang, Xiao He. Automated Fragmentation QM/MM Calculation of Amide Proton Chemical Shifts in Proteins with Explicit Solvent Model. J. Chem. Theory Comput. 9, 2104–2114 American Chemical Society (ACS), 2013. Link

13. Thomas E. Exner, Andrea Frank, Ionut Onila, Heiko M. Möller. Toward the Quantum Chemical Calculation of NMR Chemical Shifts of Proteins. 3. Conformational Sampling and Explicit Solvents Model. J. Chem. Theory Comput. 8, 4818–4827 American Chemical Society (ACS), 2012. Link

14. Chris Vanessa Sumowski, Matti Hanni, Sabine Schweizer, Christian Ochsenfeld. Sensitivity of ab Initio vs Empirical Methods in Computing Structural Effects on NMR Chemical Shifts for the Example of Peptides. J. Chem. Theory Comput. 10, 122–133 American Chemical Society (ACS), 2014. Link

15. J Swails, T Zhu, X He, DA Case. AFNMR: automated fragmentation quantum mechanical calculation of NMR chemical shifts for biomolecules.. J Biomol NMR (2015).

16. OA Martin, YA Arnautova, AA Icazatti, HA Scheraga, JA Vila. Physics-based method to validate and repair flaws in protein structures.. Proc Natl Acad Sci U S A 110, 16826-31 (2013).

17. LL Parker, AR Houk, JH Jensen. Cooperative hydrogen bonding effects are key determinants of backbone amide proton chemical shifts in proteins.. J Am Chem Soc 128, 9863-72 (2006).

18. JA Vila, YA Arnautova, OA Martin, HA Scheraga. Quantum-mechanics-derived 13Calpha chemical shift server (CheShift) for protein structure validation.. Proc Natl Acad Sci U S A 106, 16972-7 (2009).

19. AS Christensen, TE Linnet, M Borg, W Boomsma, K Lindorff-Larsen, T Hamelryck, JH Jensen. Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics.. PLoS One 8, e84123 (2013).

20. Anders S. Christensen, Stephan P. A. Sauer, Jan H. Jensen. Definitive Benchmark Study of Ring Current Effects on Amide Proton Chemical Shifts. J. Chem. Theory Comput. 7, 2078–2084 American Chemical Society (ACS), 2011. Link

21. AS Christensen, T Hamelryck, JH Jensen. FragBuilder: an efficient Python library to setup quantum chemistry calculations on peptides models.. PeerJ 2, e277 (2014).

22. G Platzer, M Okon, LP McIntosh. pH-dependent random coil (1)H, (13)C, and (15)N chemical shifts of the ionizable amino acids: a guide for protein pK a measurements.. J Biomol NMR 60, 109-29 (2014).

23. T Harder, W Boomsma, M Paluszewski, J Frellsen, KE Johansson, T Hamelryck. Beyond rotamers: a generative, probabilistic model of side chains in proteins.. BMC Bioinformatics 11, 306 (2010).

24. JJ Stewart. Optimization of parameters for semiempirical methods V: modification of NDDO approximations and application to 70 elements.. J Mol Model 13, 1173-213 (2007).

25. Ying Zhang, Anan Wu, Xin Xu, Yijing Yan. OPBE: A promising density functional for the calculation of nuclear shielding constants. Chemical Physics Letters 421, 383–388 Elsevier BV, 2006. Link

26. Vincenzo Barone, Maurizio Cossi. Quantum Calculation of Molecular Energies and Energy Gradients in Solution by a Conductor Solvent Model. J. Phys. Chem. A 102, 1995–2001 American Chemical Society (ACS), 1998. Link

27. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino, G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, Jr. Montgomery, J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, N. Rega, J. M. Millam, M. Klene, J. E. Knox, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, R. L. Martin, K. Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dannenberg, S. Dapprich, A. D. Daniels, Ö. Farkas, J. B. Foresman, J. V. Ortiz, J. Cioslowski, D. J. Fox. Gaussian∼09 Revision D.01. (2014).

28. Eric Jones, Travis Oliphant, Pearu Peterson, others. SciPy: Open source scientific tools for Python. (2001–). Link

29. JC Kromann, AS Christensen, C Steinmann, M Korth, JH Jensen. A third-generation dispersion and third-generation hydrogen bonding corrected PM6 method: PM6-D3H+.. PeerJ 2, e449 (2014).

30. Jacopo Tomasi, Benedetta Mennucci, Roberto Cammi. Quantum Mechanical Continuum Solvation Models. Chemical Reviews 105, 2999–3094 American Chemical Society (ACS), 2005. Link

31. Casper Steinmann, Kristoffer L. Blædel, Anders S. Christensen, Jan H. Jensen. Interface of the Polarizable Continuum Model of Solvation with Semi-Empirical Methods in the GAMESS Program. PLoS ONE 8, e67725 Public Library of Science (PLoS), 2013. Link

32. Alexander D. Mackerell. Empirical force fields for biological macromolecules: Overview and issues. J. Comput. Chem. 25, 1584–1604 Wiley-Blackwell, 2004. Link

33. Di Qiu, Peter S. Shenkin, Frank P. Hollinger, W. Clark Still. The GB/SA Continuum Model for Solvation. A Fast Analytical Method for the Calculation of Approximate Born Radii. J. Phys. Chem. A 101, 3005–3014 American Chemical Society (ACS), 1997. Link

34. S Vijay-Kumar, CE Bugg, WJ Cook. Structure of ubiquitin refined at 1.8 A resolution.. J Mol Biol 194, 531-44 (1987).

35. TS Ulmer, BE Ramirez, F Delaglio, A Bax. Evaluation of backbone proton positions and dynamics in a small protein by liquid crystal NMR spectroscopy.. J Am Chem Soc 125, 9179-91 (2003).

36. Michael W. Schmidt, Kim K. Baldridge, Jerry A. Boatz, Steven T. Elbert, Mark S. Gordon, Jan H. Jensen, Shiro Koseki, Nikita Matsunaga, Kiet A. Nguyen, Shujun Su, Theresa L. Windus, Michel Dupuis, John A. Montgomery. General atomic and molecular electronic structure system. J. Comput. Chem. 14, 1347–1363 Wiley-Blackwell, 1993. Link

37. Jay W. Ponder, Frederic M. Richards. An efficient newton-like method for molecular mechanics energy minimization of large molecules. J. Comput. Chem. 8, 1016–1024 Wiley-Blackwell, 1987. Link

38. Gabriel Cornilescu, John L. Marquardt, Marcel Ottiger, Ad Bax. Validation of Protein Structure from Anisotropic Carbonyl Chemical Shifts in a Dilute Liquid Crystalline Phase. J. Am. Chem. Soc. 120, 6836–6837 American Chemical Society (ACS), 1998. Link

39. O. F. Lange, N.-A. Lakomek, C. Fares, G. F. Schroder, K. F. A. Walter, S. Becker, J. Meiler, H. Grubmuller, C. Griesinger, B. L. de Groot. Recognition Dynamics Up to Microseconds Revealed from an RDC-Derived Ubiquitin Ensemble in Solution. Science 320, 1471–1475 American Association for the Advancement of Science (AAAS), 2008. Link

40. Kresten Lindorff-Larsen, Robert B. Best, Mark A. DePristo, Christopher M. Dobson, Michele Vendruscolo. Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 Nature Publishing Group, 2005. Link

41. Rinaldo W. Montalvao, Alfonso De Simone, Michele Vendruscolo. Determination of structural fluctuations of proteins from structure-based calculations of residual dipolar couplings. Journal of Biomolecular NMR 53, 281–292 Springer Science $$\mathplus$$ Business Media, 2012. Link

42. R. Bryn Fenwick, Santi Esteban-Marti'n, Barbara Richter, Donghan Lee, Korvin F. A. Walter, Dragomir Milovanovic, Stefan Becker, Nils A. Lakomek, Christian Griesinger, Xavier Salvatella. Weak Long-Range Correlated Motions in a Surface Patch of Ubiquitin Involved in Molecular Recognition. J. Am. Chem. Soc. 133, 10336–10339 American Chemical Society (ACS), 2011. Link

43. M Sundd, N Iverson, B Ibarra-Molero, JM Sanchez-Ruiz, AD Robertson. Electrostatic interactions in ubiquitin: stabilization of carboxylates by lysine amino groups.. Biochemistry 41, 7586-96 (2002).

44. Robert E. Lenkinski, Douglas M. Chen, Jerry D. Glickson, Gideon Goldstein. Nuclear magnetic resonance studies of the denaturation of ubiquitin. Biochimica et Biophysica Acta (BBA) - Protein Structure 494, 126–130 Elsevier BV, 1977. Link

45. Devesh Khare, Patrick Alexander, Jan Antosiewicz, Philip Bryan, Michael Gilson, John Orban. p K a Measurements from Nuclear Magnetic Resonance for the B1 and B2 Immunoglobulin G-Binding Domains of Protein G:  Comparison with Calculated Values for Nuclear Magnetic Resonance and X-ray Structures . Biochemistry 36, 3580–3589 American Chemical Society (ACS), 1997. Link

46. Wouter Boomsma, Jes Frellsen, Tim Harder, Sandro Bottaro, Kristoffer E. Johansson, Pengfei Tian, Kasper Stovgaard, Christian Andreetta, Simon Olsson, Jan B. Valentin, Lubomir D. Antonov, Anders S. Christensen, Mikael Borg, Jan H. Jensen, Kresten Lindorff-Larsen, Jesper Ferkinghoff-Borg, Thomas Hamelryck. PHAISTOS: A framework for Markov chain Monte Carlo simulation and inference of protein structure. J. Comput. Chem. 34, 1697–1705 Wiley-Blackwell, 2013. Link

47. Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.3r1. (2010).

48. E. L. Ulrich, H. Akutsu, J. F. Doreleijers, Y. Harano, Y. E. Ioannidis, J. Lin, M. Livny, S. Mading, D. Maziuk, Z. Miller, E. Nakatani, C. F. Schulte, D. E. Tolmie, R. Kent Wenger, H. Yao, J. L. Markley. BioMagResBank. Nucleic Acids Research 36, D402–D408 Oxford University Press (OUP), 2007. Link

49. Beat Vögeli, Sina Kazemi, Peter Güntert, Roland Riek. Spatial elucidation of motion in proteins by ensemble-based structure calculation using exact NOEs. Nat Struct Mol Biol 19, 1053–1057 Nature Publishing Group, 2012. Link

50. Kamil Tamiola, Burccin Acar, Frans A. A. Mulder. Sequence-Specific Random Coil Chemical Shifts of Intrinsically Disordered Proteins. J. Am. Chem. Soc. 132, 18000–18003 American Chemical Society (ACS), 2010. Link

51. Axel D. Becke. Density-functional thermochemistry. III. The role of exact exchange. The Journal of Chemical Physics 98, 5648 AIP Publishing, 1993. Link

52. Jorge A. Vila, Héctor A. Baldoni, Harold A. Scheraga. Performance of density functional models to reproduce observed 13 C $$\upalpha$$ chemical shifts of proteins in solution. J. Comput. Chem. 30, 884–892 Wiley-Blackwell, 2009. Link

53. Bernard Rosner. Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics 25, 165–172 Informa UK Limited, 1983. Link

54. JA Vila, YA Arnautova, OA Martin, HA Scheraga. Are accurate computations of the 13C’ shielding feasible at the DFT level of theory?. J Comput Chem 35, 309-12 (2014).

55. Yelena A. Arnautova, Jorge A. Vila, Osvaldo A. Martin, Harold A. Scheraga. What can we learn by computing 13 C $$\upalpha$$ chemical shifts for X-ray protein models?. Acta Crystallogr D Biol Cryst 65, 697–703 International Union of Crystallography (IUCr), 2009. Link

56. Jorge A. Vila, Pedro Serrano, Kurt Wüthrich, Harold A. Scheraga. Sequential nearest-neighbor effects on computed 13C$$\upalpha$$ chemical shifts. Journal of Biomolecular NMR 48, 23–30 Springer Science $$\mathplus$$ Business Media, 2010. Link