Methodology

Backbone scans

\label{subsec:bbscan} The capped AXA tripeptides used to compute the first three terms of Eq \ref{eqn:procs} were constructed using the FragBuilder Python module \cite{24688855}, which was also used to make different conformations. The acidic and basic amino acids are all modeled in their charged state, including Histidine. This will be the correct charged state for most ionizable residues in most proteins. However, for any ionizable residues that are in their neutral state this approximation can introduce large errors. For example, the C\(\beta\) chemical shifts of Asp and His change by 3.0 and 2.4 ppm due to protonation state changes in small peptides, while the N-chemical shifts change by 1.5 and 1.8 ppm \cite{25239571}. This issue will be addressed in future studies. Only Cysteine is modeled and not the disulfide bonded Cysteine. For each tripeptide a scan on the central residue’s backbone and side chain dihedral angles \(\phi\), \(\psi\), \(\chi_{1}\), \(\chi_{2}\), \(\chi_{3}\), \(\chi_{4}\) was carried out. The \(\omega\) dihedral angle was fixed at 180\(^\circ\). The \(\phi\)/\(\psi\) backbone angles on the N and C-termini alanine residues were fixed at -140\(^\circ\) and 120\(^\circ\) corresponding to typical \(\beta\)-sheet residue backbone angles. The scans were done with a 20\(^\circ\) grid spacing. For the alanine AAA tripeptide this resulted in 361 conformations from a \(\phi\)/\(\psi\) scan. For amino acid types with more than two side chain angles this approach would result in far to many samples. Instead we used BASILISK \cite{20525384} that allows us to sample from the continuous space of the side chain torsion degrees of freedom. 1000 conformations were generated for each \(\phi\)/\(\psi\) backbone pair spaced by 20\(^\circ\). See Table S1 in Supplementary Materials for an overview of the number of conformations sampled for each residue. The geometry of each conformation were optimized with PM6 \cite{17828561} with the backbone and side chain torsion angles frozen. The GIAO NMR calculations were done at the OPBE/6-31G(d,p) level of theory \cite{Zhang_2006} using the CPCM continuum solvation model \cite{Barone_1998} with a dielectric constant of 78. The rationale for using 78 is that the bulk solvent effects will have the largest effect for charged side-chains, which are usually located on the surface of the protein. Both the optimization and NMR calculation were done with Gaussian 09 program\cite{g09}. In total the ProCS15 backbone terms are based on \(\sim\)2.35 million DFT calculations.