Experimenters face challenges and limitations while analyzing glycoproteins due to their high flexibility, stereochemistry, anisotropic effects, and hydration phenomena. Computational studies complement experiments and have been used in characterization of the structural properties of glycoproteins. However, recent investigations revealed that computational studies face significant challenges as well. Here, we introduce and discuss some of these challenges and weaknesses in the investigations of glycoproteins. We also present requirements of future developments in computational biochemistry and computational biology areas that could be necessary for providing more accurate structural property analyses of glycopro-teins using computational tools. Further theoretical strategies that need to be and can be developed are discussed herein.
The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real-world application. In CASP7, the metric for molecular replacement assessment involved full likelihood-based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood-based rigid-body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined likelihood (LLG) score. This enabled multi-copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative-expected-LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X-ray, NMR or cryo-EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.
The denatured state of several proteins has been shown to display transient structures that are relevant for folding, stability and aggregation. To detect them by nuclear magnetic resonance (NMR) spectroscopy, the denatured state must be stabilized by chemical agents or changes in temperature. This makes the environment different from that experienced in biologically relevant processes. Using high-resolution heteronuclear NMR spectroscopy, we have characterized several denatured states of a monomeric variant of HIV-1 protease induced by different concentrations of urea, guanidinium chloride and acetic acid. We have extrapolated the chemical shifts and the relaxation parameters to the denaturant-free denatured state at native conditions, showing that they converge to the same values. Subsequently, we characterized the conformational properties of this biologically relevant denatured state under native conditions by advanced molecular dynamics simulations and validated the results by comparison to experimental data. We show that the denatured state of HIV-1 protease under native conditions displays rich patterns of transient native and non-native structures, which could be of relevance to its guidance through a complex folding process.
Acetylcholinesterase (AChE) is the crucial enzyme in the central nervous system. It is the target of various organophosphorus nerve agents and pesticides, and the inhibition of AChE is a therapeutic strategy for the treatment of various neurological-related diseases. The Glu202 is a key residue adjacent to the catalytic His447 and plays important role in catalysis. Although the Glu202 has long been considered as negatively charged in many studies, more and more evidences support a protonated Glu202. However, Glu202 is freely accessible by solvent, and thus it seems more reasonable for Glu202 to majorly take the deprotonated state. In the present work, we carried out a series of molecular dynamics simulations with the Glu202 adopting different protonation states. Our results show that the protonated Glu202 is important in maintaining the key hydrogen bond network that supports the catalytic triad, whereas the deprotonated Glu202 results in the collapse of the key hydrogen bond network which consequently destabilizes the catalytic His447. We also notice that different protonation states of Glu202 merely alters the binding mode of ACh. However, since the catalytic His447 is disrupted if Glu202 is deprotonated, His447 can not facilitate the nucleophilic attack performed by Ser203. Therefore, the catalytic efficiency of ACh hydrolysis should be remarkably decreased if Glu202 is deprotonated. Our findings suggest that, when designing and developing highly active AChE inhibitors or proposing mechanistic hypotheses for AChE-catalyzed reactions, the protonated state of Glu202 should be considered.
When two or more amino acid mutations occur in protein systems, they can interact in a non-additive fashion termed epistasis. One way to quantify epistasis between mutation pairs in protein systems is by using free energy differences: ϵ = 𝚫𝚫G1,2 - (𝚫𝚫G1 + 𝚫𝚫G2) where 𝚫𝚫G refers to the change in the Gibbs free energy, subscripts 1 and 2 refer to single mutations in arbitrary order and 1,2 refers to the double mutant. In this study, we explore possible biophysical mechanisms that drive pairwise epistasis in both protein-protein binding affinity and protein folding stability. Using the largest available datasets containing experimental protein structures and free energy data, we derived statistical models for both binding and folding epistasis (ϵ) with similar explanatory power (R2) of 0.299 and 0.258, respectively. These models contain terms and interactions that are consistent with intuition. For example, increasing the Cartesian separation between mutation sites leads to a decrease in observed epistasis for both folding and binding. Our results provide insight into factors that contribute to pairwise epistasis in protein systems and their importance in explaining epistasis. However, the low explanatory power indicates that more study is needed to fully understand this phenomenon.
The Continuous Automated Model EvaluatiOn (CAMEO) platform complements the biennial CASP experiment by conducting fully automated blind evaluations of 3D protein prediction servers based on the weekly pre‐release of sequences of those structures, which are going to be published in the upcoming release of the Protein Data Bank (PDB). While in CASP14 significant success was observed in predicting the structures of individual protein chains with high accuracy, significant challenges remain in correctly predicting the structures of complexes. By implementing fully automated evaluation of predictions for protein-protein complexes, as well as for proteins in complex with ligands, peptides, nucleic acids, or proteins containing non-canonical amino acid residues, CAMEO will assist new developments in those challenging areas of active research.
We explored the Protein Data-Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included. For the 143 groups identified as bound-unbound structures of the same protein , we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions, and compared it to the supposedly binding-induced modifications. This benchmark is, to our knowledge, the first attempt made to peruse available structures of protein – ssDNA interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.
Amyloid beta (Aβ of Alzheimer’s disease) and α-synuclein (α-Syn of Parkinson’s disease) form large fibrils. Evidence is increasing however that much smaller oligomers are more toxic and that these oligomers can form transmembrane ion channels. We have proposed previously that Aβ42 oligomers, annular protofibrils, and ion channels adopt concentric β-barrel molecular structures. Here we extend that hypothesis to the superfamily of α, β, and γ-synucleins. Our models of numerous Synuclein oligomers, annular protofibrils, tubular protofibrils, lipoproteins, and ion channels were developed to be consistent with sizes, shapes, molecular weights, and secondary structures of assemblies as determined by EM and other studies. The models have the following features: 1) all subunits have identical structures and interactions; 2) they are consistent with conventional β-barrel theory; 3) the distance between walls of adjacent β-barrels is between 0.6 and 1.2 nm; 4) hydrogen bonds, salt bridges, interactions among aromatic side-chains, burial and tight packing of hydrophobic side-chains, and aqueous solvent exposure of hydrophilic side-chains are relatively optimal; and 5) residues that are identical among distantly related homologous proteins cluster in the interior of most oligomers whereas residues that are hypervariable are exposed on protein surfaces. Atomic scale models of some assemblies were developed.
We report here an assessment of the model refinement category of the 14th round of Critical Assessment of Structure Prediction (CASP14). As before, predictors submitted up to five ranked refinements, along with associated residue-level error estimates, for targets that had a wide range of starting quality. The ability of groups to accurately rank their submissions and to predict coordinate error varied widely. Overall only four groups out-performed a “naïve predictor” corresponding to resubmission of the starting model. Among the top groups there are interesting differences of approach and in the spread of improvements seen: some methods are more conservative, others more adventurous. Some targets were “double-barrelled” for which predictors were offered a high-quality AlphaFold 2 (AF2)-derived prediction alongside another of lower quality. The AF2-derived models were largely unimprovable, their apparent errors being found to reside very largely at domain and, especially, crystal lattice contacts. Refinement is shown to have a mixed impact overall on structure-based function annotation methods to predict nucleic acid binding, spot catalytic sites and dock protein structures.
Glycoside hydrolase family 57 glycogen branching enzymes (GH57GBE) catalyze the formation of an α-1,6 glycosidic bond between α-1,4 linked glucooliogosaccharides. As an atypical family, a limited number of GH57GBEs have been biochemically characterized so far. This study aimed at acquiring a better understanding of the GH57GBE family by a systematic sequence-based bioinformatics analysis of almost 2,500 gene sequences and determining the branching activity of several native and mutant GH57GBEs. A correlation was found between a very low or even no branching activity with the absence of a flexible loop, a tyrosine at the loop tip, and two β-sheets.
The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template-based modeling and free docking. Structural templates were identified using sensitive sequence-based searches. If sequence-based searches failed, we performed structure-based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody-antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best predicted interfaces according to several protein-protein interface accuracy measures. Since we did not use co-evolution-based prediction of inter-chain contacts, we attribute our relative success in predicting interfaces primarily to the emphasis on the interaction interface when modeling and scoring.
Ryanodine receptor 1 (RyR1) is an intracellular calcium ion (Ca2+) release channel required for skeletal muscle contraction. Although cryo-electron microscopy identified binding sites of three coactivators Ca2+, ATP and caffeine (CFF), the mechanism of co-regulation and synergy of these activators is unknown. Here, we report allosteric connections among the three ligand binding sites and pore region in (i) Ca2+ bound-closed, (ii) ATP/CFF bound- closed, (iii) Ca2+/ATP/CFF bound-closed, and (iv) Ca2+/ATP/CFF bound-open RyR1 states. We identified two dominant interactions that mediate interactions between the Ca2+ binding site and pore region in Ca2+ bound-closed state, which partially overlapped with the pore communications in ATP/CFF bound-closed RyR1 state. In Ca2+/ATP/CFF bound-closed and -open RyR1 states, co-regulatory interactions were analogous to communications in the Ca2+ bound-closed and ATP/CFF bound- closed states. Both ATP- and CFF- binding sites mediate communication between the Ca2+ binding site and the pore region in Ca2+/ATP/CFF bound - open RyR1 structure. We conclude that Ca2+, ATP, and CFF propagate their effects to the pore region through a network of overlapping interactions that mediate allosteric control and molecular synergy in channel regulation.
The novel coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has serious negative effects on health, social life, and economics. Recently, vaccines from various companies have been urgently approved to control SARS-CoV-2 infections. However, any specific antiviral drug has not been confirmed so far for regular treatment. An important target is the main protease (Mpro), which plays a major role in replication of the virus. In this study, Gaussian and residue network models are employed to reveal two distinct potential allosteric sites on Mpro that can be evaluated as drug targets besides the active site. Then, FDA-approved drugs are docked to three distinct sites with flexible docking using AutoDock Vina to identify potential drug candidates. 14 best molecule hits for the active site of Mpro are determined. 6 of these also exhibit high docking scores for the potential allosteric regions. Full-atom molecular dynamics simulations with MM-GBSA method indicate that compounds docked to active and potential allosteric sites form stable interactions with high binding free energy (∆Gbind) values. ∆Gbind values reach -52.06 kcal/mol for the active site, -51.08 kcal/mol for the potential allosteric site 1, and -42.93 kcal/mol for the potential allosteric site 2. Energy decomposition calculations per residue elucidate key binding residues stabilizing the ligands that can further serve to design pharmacophores. This systematic and efficient computational analysis successfully determines ivermectine, diosmin and selinexor currently subjected to clinical trials, and further proposes bromocriptine, elbasvir as Mpro inhibitor candidates to be evaluated against SARS-CoV-2 infection
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system in three main aspects: (1) a new deep learning based protein inter-residue distance predictor (DeepDist) to improve template-free (ab initio) tertiary structure prediction, (2) an enhanced template-based tertiary structure prediction method, and (3) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked 7th out of 146 predictors in protein tertiary structure prediction and ranked 3rd out of 136 predictors in inter-domain structure predic-tion. The results of MULTICOM demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. The performance of template-free tertiary structure prediction largely depends on the accuracy of distance pre-dictions that is closely related to the quality of multiple sequence alignments. The structural model quality assessment works reasonably well on targets for which a sufficient number of good models can be predicted, but may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed.
Elucidation of signalling events in a pathogen is potentially important to tackle the infection caused by it. Such events mediated by protein phosphorylation play important roles in infection and therefore to predict the phosphosites and substrates of the serine/threonine protein kinases, we have developed a Machine learning based approach and predicted the phosphosites for Mycobacterium tuberculosis serine/threonine protein kinases using kinase-peptide structure-sequence data. This approach utilizes features derived from kinase 3D-structure environment and known phosphosite sequences to generate Support Vector Machine based kinase specific predictions of phosphosites making it suitable for prediction of phosphosites of STPKs with no or scarce data of their phosphosites. Support vector machine outperformed the four machine learning algorithms we tried (random forest, logistic regression, support vector machine and k-nearest neighbours) with aucROC value of 0.88 on the independent testing dataset and a ten-fold cross validation accuracy of ~81.6% for the final model. Our predicted phosphosites of M. tuberculosis STPKs form an useful resource for experimental biologists enabling elucidation of STPK mediated post-translational regulation of important cellular processes. The training features file and model files, together with usage instructions file, are available at: https://github.com/vipulbiocoder/Mtb-KSPP
Lignin is one of the world’s most abundant organic polymers, and 2-pyrone-4,6-dicarboxylate lactonase (LigI) catalyzes the hydrolysis of 2-pyrone-4,6-dicarboxylate (PDC) in the degradation of lignin. The pH has profound effects on enzyme catalysis and therefore we studied this in the context of LigI. We found that changes of the pH mostly affects surface residues, while the residues at the active site are more subject to changes of the surrounding microenvironment. In accordance with this, a high pH facilitates the deprotonation of the substrate. Detailed free energy calculations by the empirical valence bond (EVB) approach revealed that the overall hydrolysis reaction is more likely when the three active site histidines (His31, His33 and His180) are protonated at the ɛ site, however, protonation at the δ site may be favored during specific steps of reaction. Our studies have uncovered the determinant role of the protonation state of the active site residues His31, His33 and His180 in the hydrolysis of PDC.
Histone is a scaffold protein that constitutes nucleosomes with DNA in the cell nucleus. When forming histone, hetero octamer is assisted by histone chaperone proteins. As a histone chaperone protein, the crystal structure of yeast nucleosome assembly protein (yNap1) has been determined. For yNap1, a nuclear export signal/sequence (NES) has been identified as a part of the long -helix. Experimental evidence via mutagenesis on budding yeast suggests the NES is necessary for transport out from the cell nucleus. However, the NES is masked by a region defined as an accessory domain (AD). In addition, the role of the AD in nuclear transport has not been elucidated yet. To address the role of the AD, we focused on phosphorylation in the AD because proteome experiments have identified multiple phosphorylation sites of yNap1. To computationally treat phosphorylation, we performed all-atom molecular dynamics (MD) simulations for a set of non-phosphorylated and phosphorylated yNap1 (Nap1-nonP and Nap1-P). As an analysis, we addressed how the NES is exposed to the protein surface by measuring its solvent-access surface area (SASA). As a result, there was a difference in the SASA distributions between both systems. Quantitatively, the median of the SASA distribution of Nap1-P was greater than that of Nap1-nonP, meaning that phosphorylation in the AD exposed to the NES, resulting in increasing its accessibility. In conclusion, yNap1 might modulate the accessibility of the NES by dislocating the AD through phosphorylation.
Recently, a bacterium strain of Ideonella sakaiensis was identified with the uncommon ability to degrade the poly(ethylene terephthalate) (PET). The PETase from I. sakaiensis strain 201-F6 catalyzes the hydrolysis of PET converting it to mono(2-hydroxyethyl) terephthalic acid (MHET), bis(2-hydroxyethyl)-TPA (BHET), and terephthalic acid (TPA). Despite the potential of this enzyme for mitigation or elimination of environmental contaminants, one of the limitations of the use of PETase for PET degradation is the fact that it acts only at moderate temperature due to its low thermal stability. Besides, molecular details of the main interaction of PET in the active site of PETase remains unclear. Herein, molecular docking and molecular dynamics (MD) simulations were applied to analyze structural changes of PETase induced by PET binding. Results from the essential dynamics revealed that β1-β2 connecting loop is very flexible. This Loop is located far from the active site of PETase and we suggest that it can be considered for mutagenesis in order to increase the thermal stability of PETase. The free energy landscape (FEL) demonstrates that the main change in the transition between the unbounded to the bounded state is associated with β7-α5 connecting loop, where the catalytic residue Asp206 is located. Overall, the present study provides insights into the molecular binding mechanism of PET into the PETase structure and a computational strategy for mapping flexible regions of this enzyme, which can be useful for the engineering of more efficient enzymes for recycling the plastic polymers using biological systems.