NMR calculations and protein structures used

In this paper we benchmark the NMR chemical shift predictions on Ubiquitin and GB3. The structures are geometry optimized using PM6-D3H+ \cite{25024918} using the PCM solvation model \cite{Tomasi_2005,Steinmann_2013} and the CHARMM22/CMAP force field \cite{Mackerell_2004} using the GB/SA solvation model \cite{Qiu_1997} with the 1UBQ \cite{3041007} and 2OED \cite{15369375} structures as starting points. The PM6-D3H+ optimizations are done using the GAMESS program \cite{Schmidt_1993} with a convergence criterion of 5 \(\times\) 10\(^{-4}\) atomic units, while the CHARMM22/CMAP optimizations are done using TINKER \cite{Ponder_1987} with the default convergence criterion of 0.01 kcal/mole/Å. In addition the following NMR-derived structural ensembles are used without further refinement: 1D3Z \cite{Cornilescu_1998}, 2K39 \cite{Lange_2008}, 1XQQ \cite{Lindorff_Larsen_2005}, 2LJ5 \cite{Montalvao_2012}, 2KOX \cite{Fenwick_2011}. In all calculation we used charged protonation states for the acidic and basic side-chains, but in the NMR ensembles Histidine were left neutral (with either ND1 or NE2 protonated) as published. The charges are consistent with the published pK\(_a\) values of Ubiquitin \cite{12056889,Lenkinski_1977} and GB3 \cite{Khare_1997}.

OBPE/6-31G(d,p)//PM6-D3H+ GIAO NMR shielding calculations were performed with Gaussian09 using the CPCM solvation model. ProCS15 calculations were done using a module written for the protein simulation framework PHAISTOS \cite{Boomsma_2013}. The module was specifically written for this paper, and can be downloaded at github.com/jensengroup/procs15. CheShift-2 calculations were performed using either the web interface at cheshift.com or the CheShift-2 PyMOL-plugin \cite{PyMOL} found at github.com/aloctavodia/cheshift. CamShift, PPM_One, Sparta+, shAIC, and ShiftX2 calculations are performed using the stand-alone predictors. The NMR chemical shielding and shifts are compared to shifts measured for Ubiquitin \cite{Cornilescu_1998} (BMRB ID 17769)\cite{Ulrich_2007} and GB3 \cite{V_geli_2012} (BMRB ID 18531), respectively, both at pH 6.5.

Much of the variation in some of the chemical shifts comes from the nature of the side-chain itself and the side chains before and after in the sequence, which can lead to inflated \(r\)-values. To separate the contributions of the sequence and the structure we subtract the measured sequence corrected random coil values \cite{Tamiola_2010} from all predicted and experimental values. Note that this does not affect the computed RMSD values.