NMR studies can provide unique information about protein conformations in solution. In CASP14, three reference structures provided by solution NMR methods were available (T1027, T1029, and T1055), as well as a fourth data set of NMR-derived contacts for a integral membrane protein (T1088). For the three targets with NMR-based structures, the best prediction results ranged from very good (GDT_TS = 0.90, for T1055) to poor (GDT_TS = 0.47, for T1029). We explored the basis of these results by comparing all CASP14 prediction models against experimental NMR data. For T1027, the NMR data reveal extensive internal dynamics, presenting a unique challenge for protein structure prediction. The analysis of T1029 motivated exploration of a novel method of “inverse structure determination”, in which an AF2 model was used to guide NMR data analysis. NMR data provided to CASP predictor groups for target T1088, a 238-residue integral membrane porin, was also used to assess several NMR-assisted prediction methods. Most groups involved in this exercise generated similar beta-barrel models, with good agreement with the experimental data. However, as was also observed in CASP13, some pure prediction groups that did not use the NMR data generated structures for T1088 that better fit the NMR data than the models generated using these experimental data. These results demonstrate the remarkable power of modern methods to predict structures of proteins with accuracies rivaling solution NMR structures, and that it is now possible to reliably use prediction models to guide and complement experimental NMR data analysis.
Protein allergens is a health risk for consumption of soybeans. To understand allerginicity mechanism, T cell epitopes of 7 soybean allergens were predicted and screened by abilities to induce cytokine interleukin 4. The relationships among amino acid composition, properties, allergenicity and pepsin hydrolysis sites were analyzed. Among the 138 T cell epitopes identified, YIKDVFRVIPSEVLS, KDVFRVIPSEVLSNS, DVFRVIPSEVLSNSY of Gly m 6.0501 (P04347), and AKADALFKAIEAYLL, ADALFKAIEAYLLAH of Gly m 4.0101 (P26987) were the most possible epitope candidates. In T cell epitopes pattern, the frequencies of amino acids Q, D, E, P and G decreased, while F, I, N, V, K and H increased. Hydrophobic residues at positions p1 and p2 and positively charged residues in positions p13 might contribute to allergenicity. Most of epitopes could be hydrolyzed by pepsin into small polypeptides within 12 residues length, and the anti-digestive epitope regions contained I, V, S, N, and Q residues. T cell epitopes EEQRQQEGVIVELSK from Gly m 5.03 (P25974) showed resistantence to pepsin hydrolysis and would cause a higher Th2 cell response. This research provides basis for the development of hypoallergenic soybean products in the soybean industry as well as for the immunotherapy design for protein allergy.
CASP (Critical Assessment of Structure prediction) conducts community experiments to determine the state of the art in computing protein structure from amino acid sequence. The process relies on the experimental community providing information about not yet public or about to be solved structures, for use as targets. For some targets, the experimental structure is not solved in time for use in CASP. Calculated structure accuracy improved dramatically in this round, implying that models should now be much more useful for resolving many sorts of experimental difficulty. To test this, selected models for seven unsolved targets were provided to the experimental groups. These models were from the AlphaFold2 group, who overall submitted the most accurate predictions in CASP14. Four targets were solved with the aid of the models, and, additionally, the structure of an already solved target was improved. An a-posteriori analysis showed that in some cases models from other groups would also be effective. This paper provides accounts of the successful application of models to structure determination, including molecular replacement for X-ray crystallography, backbone tracing and sequence positioning in a Cryo-EM structure, and correction of local features. The results suggest that in future there will be greatly increased synergy between computational and experimental approaches to structure determination.
Experimenters face challenges and limitations while analyzing glycoproteins due to their high flexibility, stereochemistry, anisotropic effects, and hydration phenomena. Computational studies complement experiments and have been used in characterization of the structural properties of glycoproteins. However, recent investigations revealed that computational studies face significant challenges as well. Here, we introduce and discuss some of these challenges and weaknesses in the investigations of glycoproteins. We also present requirements of future developments in computational biochemistry and computational biology areas that could be necessary for providing more accurate structural property analyses of glycopro-teins using computational tools. Further theoretical strategies that need to be and can be developed are discussed herein.
The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real-world application. In CASP7, the metric for molecular replacement assessment involved full likelihood-based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood-based rigid-body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined likelihood (LLG) score. This enabled multi-copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative-expected-LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X-ray, NMR or cryo-EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.
The denatured state of several proteins has been shown to display transient structures that are relevant for folding, stability and aggregation. To detect them by nuclear magnetic resonance (NMR) spectroscopy, the denatured state must be stabilized by chemical agents or changes in temperature. This makes the environment different from that experienced in biologically relevant processes. Using high-resolution heteronuclear NMR spectroscopy, we have characterized several denatured states of a monomeric variant of HIV-1 protease induced by different concentrations of urea, guanidinium chloride and acetic acid. We have extrapolated the chemical shifts and the relaxation parameters to the denaturant-free denatured state at native conditions, showing that they converge to the same values. Subsequently, we characterized the conformational properties of this biologically relevant denatured state under native conditions by advanced molecular dynamics simulations and validated the results by comparison to experimental data. We show that the denatured state of HIV-1 protease under native conditions displays rich patterns of transient native and non-native structures, which could be of relevance to its guidance through a complex folding process.
Acetylcholinesterase (AChE) is the crucial enzyme in the central nervous system. It is the target of various organophosphorus nerve agents and pesticides, and the inhibition of AChE is a therapeutic strategy for the treatment of various neurological-related diseases. The Glu202 is a key residue adjacent to the catalytic His447 and plays important role in catalysis. Although the Glu202 has long been considered as negatively charged in many studies, more and more evidences support a protonated Glu202. However, Glu202 is freely accessible by solvent, and thus it seems more reasonable for Glu202 to majorly take the deprotonated state. In the present work, we carried out a series of molecular dynamics simulations with the Glu202 adopting different protonation states. Our results show that the protonated Glu202 is important in maintaining the key hydrogen bond network that supports the catalytic triad, whereas the deprotonated Glu202 results in the collapse of the key hydrogen bond network which consequently destabilizes the catalytic His447. We also notice that different protonation states of Glu202 merely alters the binding mode of ACh. However, since the catalytic His447 is disrupted if Glu202 is deprotonated, His447 can not facilitate the nucleophilic attack performed by Ser203. Therefore, the catalytic efficiency of ACh hydrolysis should be remarkably decreased if Glu202 is deprotonated. Our findings suggest that, when designing and developing highly active AChE inhibitors or proposing mechanistic hypotheses for AChE-catalyzed reactions, the protonated state of Glu202 should be considered.
The Continuous Automated Model EvaluatiOn (CAMEO) platform complements the biennial CASP experiment by conducting fully automated blind evaluations of 3D protein prediction servers based on the weekly pre‐release of sequences of those structures, which are going to be published in the upcoming release of the Protein Data Bank (PDB). While in CASP14 significant success was observed in predicting the structures of individual protein chains with high accuracy, significant challenges remain in correctly predicting the structures of complexes. By implementing fully automated evaluation of predictions for protein-protein complexes, as well as for proteins in complex with ligands, peptides, nucleic acids, or proteins containing non-canonical amino acid residues, CAMEO will assist new developments in those challenging areas of active research.
Amyloid beta (Aβ of Alzheimer’s disease) and α-synuclein (α-Syn of Parkinson’s disease) form large fibrils. Evidence is increasing however that much smaller oligomers are more toxic and that these oligomers can form transmembrane ion channels. We have proposed previously that Aβ42 oligomers, annular protofibrils, and ion channels adopt concentric β-barrel molecular structures. Here we extend that hypothesis to the superfamily of α, β, and γ-synucleins. Our models of numerous Synuclein oligomers, annular protofibrils, tubular protofibrils, lipoproteins, and ion channels were developed to be consistent with sizes, shapes, molecular weights, and secondary structures of assemblies as determined by EM and other studies. The models have the following features: 1) all subunits have identical structures and interactions; 2) they are consistent with conventional β-barrel theory; 3) the distance between walls of adjacent β-barrels is between 0.6 and 1.2 nm; 4) hydrogen bonds, salt bridges, interactions among aromatic side-chains, burial and tight packing of hydrophobic side-chains, and aqueous solvent exposure of hydrophilic side-chains are relatively optimal; and 5) residues that are identical among distantly related homologous proteins cluster in the interior of most oligomers whereas residues that are hypervariable are exposed on protein surfaces. Atomic scale models of some assemblies were developed.
We report here an assessment of the model refinement category of the 14th round of Critical Assessment of Structure Prediction (CASP14). As before, predictors submitted up to five ranked refinements, along with associated residue-level error estimates, for targets that had a wide range of starting quality. The ability of groups to accurately rank their submissions and to predict coordinate error varied widely. Overall only four groups out-performed a “naïve predictor” corresponding to resubmission of the starting model. Among the top groups there are interesting differences of approach and in the spread of improvements seen: some methods are more conservative, others more adventurous. Some targets were “double-barrelled” for which predictors were offered a high-quality AlphaFold 2 (AF2)-derived prediction alongside another of lower quality. The AF2-derived models were largely unimprovable, their apparent errors being found to reside very largely at domain and, especially, crystal lattice contacts. Refinement is shown to have a mixed impact overall on structure-based function annotation methods to predict nucleic acid binding, spot catalytic sites and dock protein structures.
Glycoside hydrolase family 57 glycogen branching enzymes (GH57GBE) catalyze the formation of an α-1,6 glycosidic bond between α-1,4 linked glucooliogosaccharides. As an atypical family, a limited number of GH57GBEs have been biochemically characterized so far. This study aimed at acquiring a better understanding of the GH57GBE family by a systematic sequence-based bioinformatics analysis of almost 2,500 gene sequences and determining the branching activity of several native and mutant GH57GBEs. A correlation was found between a very low or even no branching activity with the absence of a flexible loop, a tyrosine at the loop tip, and two β-sheets.
The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template-based modeling and free docking. Structural templates were identified using sensitive sequence-based searches. If sequence-based searches failed, we performed structure-based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody-antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best predicted interfaces according to several protein-protein interface accuracy measures. Since we did not use co-evolution-based prediction of inter-chain contacts, we attribute our relative success in predicting interfaces primarily to the emphasis on the interaction interface when modeling and scoring.
Ryanodine receptor 1 (RyR1) is an intracellular calcium ion (Ca2+) release channel required for skeletal muscle contraction. Although cryo-electron microscopy identified binding sites of three coactivators Ca2+, ATP and caffeine (CFF), the mechanism of co-regulation and synergy of these activators is unknown. Here, we report allosteric connections among the three ligand binding sites and pore region in (i) Ca2+ bound-closed, (ii) ATP/CFF bound- closed, (iii) Ca2+/ATP/CFF bound-closed, and (iv) Ca2+/ATP/CFF bound-open RyR1 states. We identified two dominant interactions that mediate interactions between the Ca2+ binding site and pore region in Ca2+ bound-closed state, which partially overlapped with the pore communications in ATP/CFF bound-closed RyR1 state. In Ca2+/ATP/CFF bound-closed and -open RyR1 states, co-regulatory interactions were analogous to communications in the Ca2+ bound-closed and ATP/CFF bound- closed states. Both ATP- and CFF- binding sites mediate communication between the Ca2+ binding site and the pore region in Ca2+/ATP/CFF bound - open RyR1 structure. We conclude that Ca2+, ATP, and CFF propagate their effects to the pore region through a network of overlapping interactions that mediate allosteric control and molecular synergy in channel regulation.
The novel coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has serious negative effects on health, social life, and economics. Recently, vaccines from various companies have been urgently approved to control SARS-CoV-2 infections. However, any specific antiviral drug has not been confirmed so far for regular treatment. An important target is the main protease (Mpro), which plays a major role in replication of the virus. In this study, Gaussian and residue network models are employed to reveal two distinct potential allosteric sites on Mpro that can be evaluated as drug targets besides the active site. Then, FDA-approved drugs are docked to three distinct sites with flexible docking using AutoDock Vina to identify potential drug candidates. 14 best molecule hits for the active site of Mpro are determined. 6 of these also exhibit high docking scores for the potential allosteric regions. Full-atom molecular dynamics simulations with MM-GBSA method indicate that compounds docked to active and potential allosteric sites form stable interactions with high binding free energy (∆Gbind) values. ∆Gbind values reach -52.06 kcal/mol for the active site, -51.08 kcal/mol for the potential allosteric site 1, and -42.93 kcal/mol for the potential allosteric site 2. Energy decomposition calculations per residue elucidate key binding residues stabilizing the ligands that can further serve to design pharmacophores. This systematic and efficient computational analysis successfully determines ivermectine, diosmin and selinexor currently subjected to clinical trials, and further proposes bromocriptine, elbasvir as Mpro inhibitor candidates to be evaluated against SARS-CoV-2 infection
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system in three main aspects: (1) a new deep learning based protein inter-residue distance predictor (DeepDist) to improve template-free (ab initio) tertiary structure prediction, (2) an enhanced template-based tertiary structure prediction method, and (3) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked 7th out of 146 predictors in protein tertiary structure prediction and ranked 3rd out of 136 predictors in inter-domain structure predic-tion. The results of MULTICOM demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. The performance of template-free tertiary structure prediction largely depends on the accuracy of distance pre-dictions that is closely related to the quality of multiple sequence alignments. The structural model quality assessment works reasonably well on targets for which a sufficient number of good models can be predicted, but may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed.
Elucidation of signalling events in a pathogen is potentially important to tackle the infection caused by it. Such events mediated by protein phosphorylation play important roles in infection and therefore to predict the phosphosites and substrates of the serine/threonine protein kinases, we have developed a Machine learning based approach and predicted the phosphosites for Mycobacterium tuberculosis serine/threonine protein kinases using kinase-peptide structure-sequence data. This approach utilizes features derived from kinase 3D-structure environment and known phosphosite sequences to generate Support Vector Machine based kinase specific predictions of phosphosites making it suitable for prediction of phosphosites of STPKs with no or scarce data of their phosphosites. Support vector machine outperformed the four machine learning algorithms we tried (random forest, logistic regression, support vector machine and k-nearest neighbours) with aucROC value of 0.88 on the independent testing dataset and a ten-fold cross validation accuracy of ~81.6% for the final model. Our predicted phosphosites of M. tuberculosis STPKs form an useful resource for experimental biologists enabling elucidation of STPK mediated post-translational regulation of important cellular processes. The training features file and model files, together with usage instructions file, are available at: https://github.com/vipulbiocoder/Mtb-KSPP
Lignin is one of the world’s most abundant organic polymers, and 2-pyrone-4,6-dicarboxylate lactonase (LigI) catalyzes the hydrolysis of 2-pyrone-4,6-dicarboxylate (PDC) in the degradation of lignin. The pH has profound effects on enzyme catalysis and therefore we studied this in the context of LigI. We found that changes of the pH mostly affects surface residues, while the residues at the active site are more subject to changes of the surrounding microenvironment. In accordance with this, a high pH facilitates the deprotonation of the substrate. Detailed free energy calculations by the empirical valence bond (EVB) approach revealed that the overall hydrolysis reaction is more likely when the three active site histidines (His31, His33 and His180) are protonated at the ɛ site, however, protonation at the δ site may be favored during specific steps of reaction. Our studies have uncovered the determinant role of the protonation state of the active site residues His31, His33 and His180 in the hydrolysis of PDC.
Histone is a scaffold protein that constitutes nucleosomes with DNA in the cell nucleus. When forming histone, hetero octamer is assisted by histone chaperone proteins. As a histone chaperone protein, the crystal structure of yeast nucleosome assembly protein (yNap1) has been determined. For yNap1, a nuclear export signal/sequence (NES) has been identified as a part of the long -helix. Experimental evidence via mutagenesis on budding yeast suggests the NES is necessary for transport out from the cell nucleus. However, the NES is masked by a region defined as an accessory domain (AD). In addition, the role of the AD in nuclear transport has not been elucidated yet. To address the role of the AD, we focused on phosphorylation in the AD because proteome experiments have identified multiple phosphorylation sites of yNap1. To computationally treat phosphorylation, we performed all-atom molecular dynamics (MD) simulations for a set of non-phosphorylated and phosphorylated yNap1 (Nap1-nonP and Nap1-P). As an analysis, we addressed how the NES is exposed to the protein surface by measuring its solvent-access surface area (SASA). As a result, there was a difference in the SASA distributions between both systems. Quantitatively, the median of the SASA distribution of Nap1-P was greater than that of Nap1-nonP, meaning that phosphorylation in the AD exposed to the NES, resulting in increasing its accessibility. In conclusion, yNap1 might modulate the accessibility of the NES by dislocating the AD through phosphorylation.