Comparison to experimental chemical shifts using NMR-derived ensembles

Table \ref{table:ensembles} lists the RMSD and \(r\) values computed for Ubiquitin using the x-ray structure 1UBQ and five NMR-derived structural ensembles with between 10 and 640 structures. For ProCS15 the average chemical shift is obtained by computing the average chemical shielding for each nucleus followed by the linear regression fit to experimental chemical shift values (cf. Eq \ref{eqn:scaling}) to obtain the predicted average chemical shifts. The procedure is the same for the remaining methods except that chemical shifts are used instead of chemical shieldings.

For ProCS15 use of ensemble structures lowers the RMSD values for all atom types, with decreases in the range 0.1 - 0.7 ppm for heavy atoms and 0.1 ppm hydrogen atoms. Similar improvements are observed for C\(\alpha\) and C\(\beta\) for CheShift-2, except that the improvement in RMSD for C\(\beta\) (0.5 ppm) is larger compared to ProCS15 (0.3 ppm). These improvements are expected if the NMR-derived ensembles are a more accurate representation of the protein structure in solution than the single x-ray structure \cite{Arnautova_2009, Vila_2010}.

Improvements are also observed for CamShift, with RMSD-decreases of 0.3 - 1.7 and 0.2 ppm for heavy and hydrogen atoms, respectively. In the case of PPM_One, Sparta+, and shAIC modest (up to 0.3 ppm) RMSD-decreases are observed for some ensembles but not others and, on average, the RMSD is roughly equally likely to remain unchanged or increase slightly. Finally, for ShiftX2 the RMSD consistently increases (by up to 0.7 ppm) on going from the x-ray structure to the ensembles, with the exception of C\(\alpha\) where the RMSD is lowered by 0.1 ppm. We note that the RMSD values predicted with CamShift using the crystal structure are significantly larger than when using the CHARMM/CMAP structure (presumably due to hydrogen being optimized placed in accordance to the CHARMM22 topology file in the CamShift training set) and that the reduction in RMSD on going to ensembles is at most 0.3 ppm relatively to these values. So, it appears that the use of ensemble structures does not lead to a significant increase in accuracy compared to using a single structure for any of the empirical methods, in contrast to ProCS15 and CheShift-2.

The observations are consistent with earlier observations \cite{16866544, Sumowski_2014,Vila_2009,24391900} that the empirical NMR prediction methods tend to be significantly less sensitive to changes in protein structure compared to DFT-based chemical shift predictors or chemical shifts computed using QM methods.