Probing the structures of amyloid-beta (Aβ) peptides in the early steps of aggregation is extremely difficult experimentally and computationally. Yet, this knowledge is extremely important as small oligomers are the most toxic species. Experiments and simulations on Aβ42 monomer point to random coil conformations with either transient helical or β-strand content. Our current conformational description of small Aβ42 oligomers is funneled toward amorphous aggregates with some β-sheet content and rare excited states with well-ordered assemblies of β-sheets. In this study, we emphasize another view based on metastable α-helix bundle oligomers spanning the C-terminus residues which are predicted by the machine-learning AlphaFold2 method and supported indirectly by low-resolution experimental data on many amyloid polypeptides. This finding has consequences in designing drugs to reduce aggregation and toxicity.
The rapid adaptation of SARS-CoV-2 within the host species and the increased viral transmission triggered the evolution of different SARS-CoV-2 variants. Though numerous monoclonal antibodies (mAbs) have been identified as prophylactic therapy for SARS-CoV-2, the ongoing surge in the number of SARS-CoV-2 infections shows the importance of understanding the mutations in the spike and developing novel vaccine strategies to target all variants. Here, we report the map of experimentally validated 74 SARS-CoV-2 neutralizing mAb binding epitopes of all variants. The majority (87.84%) of the potent neutralizing epitopes are localized to the receptor-binding domain (RBD) and overlap with each other, whereas limited (12.16%) epitopes are found in the N-terminal domain (NTD). Notably, 69 out of 74 mAb targets have at least one mutation at the epitope sites. The potent epitopes found in the RBD show higher mutations (4-10aa) compared to lower or modest neutralizing antibodies, suggesting that these epitopes might co-evolve with the immune pressure. The current study shows the importance of determining the critical mutations at the antibody recognition epitopes, leading to the development of broadly reactive immunogens targeting multiple SARS-CoV-2 variants. Further, vaccines inducing both humoral and cell-mediated immune responses might prevent the escape of SARS-CoV-2 variants from neutralizing antibodies.
The protein secondary structure (SS) prediction plays an important role in the characterization of general protein structure and function. In recent years, a new generation of algorithms for SS prediction based on embeddings from protein language models (pLMs) is emerging. These algorithms reach state-of-the-art accuracy without the need for time-consuming multiple sequence alignment (MSA) calculations. LSTM-based SPOT-1D-LM and NetSurfP-3.0 are the latest examples of such predictors. We present the ProteinUnetLM model using a convolutional Attention U-Net architecture that provides prediction quality and inference times at least as good as the best LSTM-based models for 8-class SS prediction (SS8). Additionally, we address the issue of the heavily imbalanced nature of the SS8 problem by extending the loss function with the Matthews correlation coefficient (MCC), and by proper assessment using previously introduced adjusted geometric mean metric (AGM). ProteinUnetLM achieved better AGM and sequence overlap score (SOV) than LSTM-based predictors, especially for the rare structures 310-helix (G), beta-bridge (B), and high curvature loop (S). It is also competitive on challenging datasets without homologs, free-modeling targets, and chameleon sequences. Moreover, ProteinUnetLM outperformed its previous MSA-based version ProteinUnet2, and provided better AGM than AlphaFold2 for 1/3 of proteins from the CASP14 dataset, proving its potential for making a significant step forward in the domain. To facilitate the usage of our solution by protein scientists, we provide an easy-to-use web interface under [https://biolib.com/SUT/ProteinUnetLM/](https://biolib.com/SUT/ProteinUnetLM/).
Many proteins must interact with molecular chaperones to achieve their native folded state in the cell. Yet, how chaperone binding and binding-site characteristics affect the folding process is poorly understood. The ubiquitous Hsp70 chaperone system prevents client-protein aggregation by holding unfolded conformations or by unfolding misfolded states. Hsp70 binding sites of client proteins comprise a nonpolar core surrounded by positively charged residues. However, a detailed analysis of Hsp70 binding sites on a proteome-wide scale is still lacking. Further, it is not known whether proteins undergo some degree of folding while chaperone bound. Here, we begin to address the above questions by identifying Hsp70 binding sites in 2,258 E. coli proteins. We find that most proteins bear at least one Hsp70 binding site and that the number of Hsp70 binding sites is directly proportional to protein size. Aggregation propensity upon release from the ribosome correlates with number of Hsp70 binding sites only in the case of large proteins. Interestingly, Hsp70 binding sites are more solvent-exposed than other nonpolar sites, in protein native states. Our findings show that the majority of E. coli proteins are systematically enabled to interact with Hsp70 even if this interaction only takes place during a fraction of the protein lifetime. In addition, our data suggest that some conformational sampling may take place within Hsp70-bound states, due to the solvent exposure of some chaperone binding sites in native proteins. In all, we propose that Hsp70-chaperone-binding traits have evolved to favor Hsp70-assisted protein folding devoid of aggregation.
Lignocellulose is the most abundant natural biopolymer on earth and a potential raw material for the production of fuels and chemicals. However, only some organisms such as bacteria and fungi produce the necessary enzymes to metabolize it. In this work we detected the presence of extracellular cellulases in the genome of five species of Scenedesmus. These microalgae grow in both, freshwater and saltwater regions as well as in soils, displaying highly flexible metabolic properties. The comparison of sequences of the different cellulases with hydrolytic enzymes from other organisms by means of multi-sequence alignments and phylogenetic trees showed that these enzymes belong to the families of glycosyl hydrolases 1, 5, 9 and 10. In addition, most of these presented a greater similarity of sequence with enzymes from invertebrates, fungi, bacteria and other microalgae than with cellulases from plants; and the 3D modeling data obtained showed that both the main structures of the modeled proteins and the main amino acid residues implicated in catalysis and substrate binding are well conserved in Scenedesmus enzymes. We propose that these cellulase-producing phototrophic microorganisms could act as catalysts for the hydrolysis of cellulosic biomass fueled by sunlight.
The recognition of Cannabis as a source of new compounds suitable for medical use has attracted strong interest from the scientific community in its research, and substantial progress has accumulated regarding cannabinoids’ activity; however, a thorough description of their molecular mechanisms of action remains a task to complete. Highlighting their complex pharmacology, the list of cannabinoids’ interactors has vastly expanded beyond the canonical cannabinoid receptors. Among those, we have focused our study on the glycine receptor (GlyR), an ion channel involved in the modulation of nervous system responses, including, to our interest, sensitivity to peripheral pain. Here, we report the use of computational methods to investigate possible binding modes between the GlyR and Δ 9-tetrahydrocannabinol (THC). After obtaining a first pose for the THC binding from a biased molecular docking simulation and subsequently evaluating it by molecular dynamic simulations, we found a dynamic system with an identifiable representative binding mode characterized by the specific interaction with two transmembrane residues (Phe293 and Ser296). Complementarily, we assessed the role of membrane cholesterol in this interaction and positively established its relevance for THC binding to GlyR. Lastly, the use of restrained molecular dynamics simulations allowed us to refine the description of the binding mode and of the cholesterol effect. Altogether, our findings contribute to the current knowledge about the GlyR-THC mode of binding and propose a new starting point for future research on how cannabinoids in general, and THC in particular, modulate pain perception in view of its possible clinical applications.
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are – and stay – fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the Hydrophobic Cluster Analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe (soluble domains) and OPM (transmembrane domains)). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold2 databases. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Degradation of solid polyethylene terephthalate (PET) by leaf branch compost cutinase (LCC) produces various PET-derived degradation intermediates (DIs), in addition to terephthalic acid (TPA), which is the recyclable terminal product of all PET degradation. Although DIs can also be converted into TPA, in solution, by LCC, the TPA that is obtained through enzymatic degradation of PET, in practice, is always contaminated by DIs. Here, we demonstrate that the primary reason for non-degradation of DIs into TPA in solution is the efficient binding of LCC onto the surface of solid PET. Although such binding enhances the degradation of solid PET, it depletes the surrounding solution of enzyme that could otherwise have converted DIs into TPA. To retain a sub-population of enzyme in solution that would mainly degrade DIs, we introduced mutations to reduce the hydrophobicity of areas surrounding LCC’s active site, with the express intention of reducing LCC’s binding to solid PET. Despite the consequent reduction in invasion and degradation of solid PET, overall levels of production of TPA were ~3.6-fold higher, due to the partitioning of enzyme between solid PET and the surrounding solution, and the consequent heightened production of TPA from DIs. Further, synergy between such mutated LCC (F125L/F243I LCC) and wild-type LCC resulted in even higher yields, and TPA of nearly ~100% purity.
Most biomolecules become functional and bioactive by forming protein complexes through interaction with ligands that are diverse in size, shape, and physicochemical properties. In the complex biological milieu, the interaction is ligand-specific, driven by molecular sensing and recognition of a binding interface localized within a protein structure. Mapping interfaces of protein complexes is a highly sought area of research as it delivers fundamental insights into proteomes and pathology and hence strategies for therapeutics. While X-ray crystallography and electron microscopy still serve as a gold standard for structural elucidation of protein complexes, artificial and static analytic nature thereof often results in a non-native interface that otherwise might be negligible or non-existent in biological environment. In recent years, the mass spectrometry-coupled approaches, chemical crosslinking (CLMS) and hydrogen-deuterium exchange (HDMS), have become valuable analytic complements to traditional techniques. These methods explicitly identify hot residues and motifs embedded in binding interfaces, in particular, for which the interaction is predominantly dynamic, transient, and/or caused by an intrinsically disordered domain. Here we review the principal role of CLMS and HDMS in protein structural biology with a particular emphasis on the contribution of recent examples to exploring biological interfaces. In addition, we describe recent studies that utilized these methods to expand our understanding of protein complex formation and related biological processes and to increase probability of structure-based drug design.
Cationic helical peptides play a crucial role in applications such as anti-microbial and anti-cancer activity. The activity of these peptides directly correlates with their helicity. In this study, we have performed extensive all-atom molecular dynamics simulations of 25 Lysine-Leucine co-polypeptide sequences of varying charge density ( λ ) and patterns. Our findings showed that an increase in the charge density on the peptide leads to a gradual decrease in the helicity up to a critical charge density λ c . Beyond, λ c a complete helix to coil transition was observed. The decrease in the helicity correlated with the increased number of water molecules in first solvation shell, solvent-exposed surface area, and a higher value of the radius of gyration of the peptide.
Various post translational modifications like hyper phosphorylation, O-GlycNAcylation, and acetylation have been attributed to induce the abnormal folding in tau protein. Recent in vitro studies revealed the possible involvement of N–glycosylation of tau protein in the abnormal folding and tau aggregation. Hence in this study, we performed microsecond long all atom molecular dynamics simulation to gain insights into the effects of N-glycosylation on Asn-359residue which forms part of the microtubule binding region. Trajectory analysis of the stimulations coupled with essential dynamics and free energy landscape analysis suggested that tau, in its N-glycosylated form tend to exist in a largely folded conformation having high beta sheet propensity as compared to unmodified tau which exists in a large extended form with very less beta sheet propensity. Residue interaction network analysis of the lowest energy conformations further revealed that Phe378 and Lys353 are the functionally important residues in the peptide which helped in initiating the folding process and Phe378, Lys347&Lys370 helped maintaining the stability of the protein in the folded state.
Protein structures are stabilized by several types of chemical interactions between amino acids, which can compete with each other. This is the case of chalcogen and hydrogen bonds formed by the thiol group of cysteine, which can form three hydrogen bonds with one hydrogen acceptor and two hydrogen donors and a chalcogen bond with a nucleophile along the extension of the C-S bond. A survey of the Protein Data Bank shows that hydrogen bonds are about 40-50 more common than chalcogen bonds, suggesting that they are stronger and, consequently, prevail, though not always. It is also observed that frequently a thiol group that forms a chalcogen bond is also involved, as a hydrogen donor, in a hydrogen bond.
Arrestins are important scaffolding proteins that are expressed in all vertebrate animals. They regulate cell signaling events upon binding to active G-protein coupled receptors ( GPCR) and trigger endocytosis of active GPCRs. While many of the functional sites on arrestins have been characterized, the question of how these sites interact is unanswered. We used anisotropic network modelling ( ANM) together with our covariance compliment techniques to survey all of the available structures of the non-visual arrestins to map how structural changes and protein-binding affect their structural dynamics. We found that activation and clathrin binding have a marked effect on arrestin dynamics, and that these dynamics changes are localized to a small number of distant functional sites. These sites include α-helix 1, the lariat loop, nuclear localization domain, and the C-domain β-sheets on the C-loop side. Our techniques suggest that clathrin binding and/or GPCR activation of arrestin perturb the dynamics of these sites independent of structural changes.
The inversion from L- to D-stereochemistry endows peptides improved bioactivity and enhanced resistance to many proteases and peptidases. To strengthen the biostability and bioavailability of peptide drugs, enzymatic epimerization becomes an important way to incorporate D-amino acid into peptide backbones. Recently, a bifunctional thioesterase NocTE, which is responsible for the epimerization and hydrolysis of the C-terminal (p-hydroxyphenyl)glycine residue of β-lactam antibiotic nocardicin A, exclusively directs to the generation of D-diastereomers. Different from other epimerases, NocTE exhibits unique stereochemical selectivity. Herein, we investigated the catalytic mechanism of NocTE via molecular dynamic (MD) simulations and quantum mechanical/molecular mechanics (QM/MM) calculations. Through structural analyses, two key water molecules around the reaction site were found to serve as proton mediators in epimerization. The structural characteristics inspired us to propose a substrate-assisted mechanism for the epimerization, where multi-step proton transfers were mediated by water molecules and β-lactam ring, and the free energy barrier was calculated to be 20.3 kcal/mol. After that, the hydrolysis of D-configured substrate was energetically feasible with the energy barrier of 14.3 kcal/mol. As a comparison, the energy barrier for the direct hydrolysis of L-configured substrate was obtained to be 24.0 kcal/mol. Our study provides mechanistic insights into catalytic activities of bifunctional thioesterase NocTE, uncovers more clues to the molecular basis for stereochemical selectivity and paves the way for the directed biosynthesis of novel peptide drugs with various stereostructural characteristics by enzyme rational design.
Mutations are the cause of several diseases as well as the underlying force of evolution. A thorough understanding of its biophysical consequences is essential. We present a computational framework for evaluating different levels of mutual information (MI) and its dependence on mutation. We used molecular dynamics trajectories of the third PDZ domain and its different mutations. MI calculated from these trajectories shows that: (i) the multivariate Gaussian distribution of joint probabilities characterizes the MI between residue pairs with sufficient accuracy. Nonlinearities in joint probabilities calculated by tensor Hermite polynomials up to the fifth order contribute insignificantly. (ii) Changes in MI between residue pairs show the characteristic patterns resulting from specific mutations. (iii) Triple correlations are characterized by evaluating MI between triplets of residues, certain triplets are strongly affected by mutation. (iv) Susceptibility of residues to perturbation are obtained by MI and discussed in terms of linear response theory.
The revelation of protein folding is a challenging subject in both discovery and description. Except acquirement of accurate 3D structure for protein stable state, another big hurdle is how to discover structural flexibility for protein innate character. Even if a huge number of flexible conformations are known, difficulty is how to describe these conformations. A novel approach, protein structure fingerprint, has been developed to expose the comprehensive local folding variations, and then construct folding conformations for entire protein. The backbone of 5 amino acid residues was identified as a universal folden, and then a set of Protein Folding Shape Code (PFSC) was derived for completely covering folding space in alphabetic description. Sequentially, a database was created to collect all possible folding shapes of local folding variations for all permutation of 5 amino acids. Successively, Protein Folding Variation Matrix (PFVM) assembled all possible local folding variations along sequence for a protein, which possesses several prominent features. First, it showed the fluctuation with certain folding patterns along sequence which revealed how the protein folding was related the order of amino acids in sequence. Second, all folding variations for an entire protein can be simultaneously apprehended at a glance within PFVM. Third, all conformations can be determined by local folding variations from PFVM, so total number of conformations is no longer ambiguous for any protein. Finally, the most possible folding conformation and its 3D structure can be acquired according PFVM for protein structure prediction. Therefore, the protein structure fingerprint approach provides a significant means for investigation of protein folding problem.
Myeloid cell leukemia-1 (MCL1), an anti-apoptotic BCL-2 family protein plays a major role in the control of apoptosis as the regulator of mitochondrial permeability which is deregulated in various solid and hematological malignancies. Interaction of the executioner proteins Bak/Bax with anti-apoptotic MCL1 and its cellular composition determines the apoptotic or survival pathway. This study highlighted the deleterious MCL1-Bax stabilizing effect of the mutation V220F on MCL1 structure through computational protein-protein interaction predictions and molecular dynamics simulations. The single point mutation at V220F was selected as it is residing at the hydrophobic core region of BH3 conserved domain, the site of Bax binding. The molecular dynamics simulation studies showed increase in stability of the mutated MCL1 before and after Bax binding comparable with the native MCL1. The clusters from free energy landscape found out structural variation in folding pattern with additional helix near the BH3 domain in the mutated structure. This loop to helix structural change in the mutated complex favored stable interaction of the complex and also induced Bax conformational change. Moreover, molecular mechanics based binding free energy calculations confirmed increased affinity of Bax towards mutated MCL1. Residue-wise interaction network analysis showed the individual residues in Bax binding responsible for the change in stability and interaction due to the protein mutation. In conclusion, the overall findings from the study reveal that the presence of V220F mutation on MCL1 is responsible for the structural confirmational change leading to disruption of its biological functions which might be responsible for tumorigenesis. The mutation could possibly be used as future diagnostic markers in treating cancers.
Dissimilatory sulfite reductase is an ancient enzyme that has linked the global sulfur and carbon biogeochemical cycles since at least 3.47 Gya. While much has been learned about the phylogenetic distribution and diversity of DsrAB across environmental gradients, far less is known about the structural changes that occurred to maintain DsrAB function as the enzyme accompanied diversification of sulfate/sulfite reducing organisms (SRO) into new environments. Analyses of available crystal structures of DsrAB from Archaeoglobus fulgidus and Desulfovibrio vulgaris, representing early and late evolving lineages, respectively, show that certain features of DsrAB are structurally conserved, including active siro-heme binding motifs. Whether such structural features are conserved among DsrAB recovered from varied environments, including hot spring environments that host representatives of the earliest evolving SRO lineage (e.g., MV2-Eury), is not known. To begin to overcome these gaps in our understanding of the evolution of DsrAB, structural models from MV2.Eury were generated and evolutionary sequence co-variance analyses were conducted on a curated DsrAB database. Phylogenetically diverse DsrAB harbor many conserved functional residues including those that ligate active siro-heme(s). However, evolutionary co-variance analysis of monomeric DsrAB subunits revealed several False Positive Evolutionary Couplings (FPEC) that correspond to residues that have co-evolved despite being too spatially distant in the monomeric structure to allow for direct contact. One set of FPECs corresponds to residues that form a structural path between the two active siro-heme moieties across the interface between heterodimers, suggesting the potential for allostery or electron transfer within the enzyme complex. Other FPECs correspond to structural loops and gaps that may have been selected to stabilize enzyme function in different environments. These structural bioinformatics results suggest that DsrAB has maintained allosteric communication pathways between subunits as SRO diversified into new environments. The observations outlined here provide a framework for future biochemical and structural analyses of DsrAB to examine potential allosteric control of this enzyme.