Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has caused substantially more infections, deaths, and economic disruptions than the 2002-2003 SARS-CoV. The key to understanding SARS-CoV-2’s higher infectivity lies partly in its host receptor recognition mechanism. Experiments show that the human ACE2 protein, which serves as the primary receptor for both CoVs, binds to the receptor binding domain (RBD) of CoV-2’s spike protein stronger than SARS-CoV’s spike RBD. The molecular basis for this difference in binding affinity, however, remains unexplained from X-ray structures. To go beyond insights gained from X-ray structures and investigate the role of thermal fluctuations in structure, we employ all-atom molecular dynamics simulations. Microseconds-long simulations reveal that while CoV and CoV-2 spike-ACE2 interfaces have similar conformational binding modes, CoV-2 spike interacts with ACE2 via a larger combinatorics of polar contacts, and on average, makes 45\% more polar contacts. Correlation analysis and thermodynamic calculations indicate that these differences in the density and dynamics of polar contacts arise from differences in spatial arrangements of interfacial residues, and dynamical coupling between interfacial and non-interfacial residues. These results recommend that ongoing efforts to design spike-ACE2 peptide blockers will benefit from incorporating dynamical information as well as allosteric coupling effects.
The multi-domain bacterial S1 protein is the largest and most functionally important ribosomal protein of the 30S subunit, which interacts with both mRNA and proteins. The family of ribosomal S1 proteins differs in the classical sense from a protein with tandem repeats and has a “bead-on-string” organization, where each repeat is folded into a globular domain. Based on our recent data, the study of evolutionary relationships for the bacterial phyla will provide evidence for one of the proposed theories of the evolutionary development of proteins with structural repeats: from multiple repeats of assembles to single repeats, or vice versa. In this comparative analysis of 1333 S1 sequences that were identified in 24 different phyla; we demonstrate how such phyla can independently/dependently form during evolution. To our knowledge, this work is the first study of the evolutionary history of bacterial ribosomal S1 proteins. The collected and structured data can be useful to computer biologists as a resource for determining percent identity, amino acid composition and logo motifs, as well as dN/dS ratio in bacterial S1 protein. The obtained research data suggested that the evolutionary development of bacterial ribosomal proteins S1 evolved from multiple assemblies to single repeat. The presented data are integrated into the server, which can be accessed at http://oka.protres.ru:4200.
SARS-CoV-2 is neutralized by proteins that block receptor-binding sites on spikes that project from the viral envelope. In particular, substantial research investment has advanced monoclonal antibody therapies to the clinic where there are signs of partial efficacy in reducing viral burden and hospitalization. An alternative is to use the host entry receptor, ACE2, as a soluble decoy that broadly blocks SARS-associated coronaviruses with limited potential for viral escape. Here, we summarize efforts to engineer higher affinity variants of soluble ACE2 that rival the potency of affinity-matured antibodies. Strategies have also been used to increase the valency of ACE2 decoys for avid spike interactions and to improve pharmacokinetics via IgG fusions. Finally, the intrinsic catalytic activity of ACE2 for the turnover of the vasoconstrictor angiotensin II may directly address COVID-19 symptoms and protect against lung and cardiovascular injury, conferring dual mechanisms of action unachievable by monoclonal antibodies. Soluble ACE2 derivatives therefore have the potential to be next generation therapeutics for addressing the immediate needs of the current pandemic and possible future outbreaks.
Multi-domain proteins are not only formed through natural evolution but can also be generated by recombinant DNA technology. Because many fusion proteins can enhance the selectivity of cell targeting, these artificially produced molecules, called multi-specific biologics, are promising drug candidates, especially for immunotherapy. Moreover, the rational design of domain linkers in fusion proteins is becoming an essential step toward a quantitative understanding of the dynamics in these biopharmaceutics. We developed a computational framework to characterize the impacts of peptide linkers on the dynamics of multi-specific biologics. We constructed a benchmark containing six types of linkers that represent various lengths and degrees of flexibility and used them to connect two natural proteins as a test system. The microsecond dynamics of these proteins generated from Anton were projected onto a coarse-grained conformational space. The similarity of dynamics among different proteins in this low-dimensional space was further analyzed by a neural network model. Finally, hierarchical clustering was applied to place linkers into different subgroups based on the neural network classification results. The clustering results suggest that the length of linkers used to spatially separate different functional modules plays the most important role in regulating the dynamics of this fusion protein. Given the same number of amino acids, linker flexibility functions as a regulator of protein dynamics. In summary, we illustrated that a new computational strategy can be used to study the dynamics of multi-domain fusion proteins by a combination of long timescale molecular dynamics simulation, coarse-grained modeling, and artificial intelligence.
Cysteine (Cys) is the most reactive amino acid participating in a wide range of biological functions. In-silico predictions complement the experiments to meet the need of functional characterization. Multiple Cys function prediction algorithm is scarce, in contrast to specific function prediction algorithms. Here we present a deep neural network-based multiple Cys function prediction, available on web-server (DeepCys) (https://deepcys.herokuapp.com/). DeepCys model was trained and tested on two independent datasets curated from protein crystal structures. This prediction method requires three inputs, namely, PDB identifier (ID), chain ID and residue ID for a given Cys and outputs the probabilities of four cysteine functions, namely, disulphide, metal-binding, thioether and sulphenylation and predicts the most probable Cys function. The algorithm exploits the local and global protein properties, like, sequence and secondary structure motifs, buried fractions, microenvironments and protein/enzyme class. DeepCys outperformed most of the multiple and specific Cys function algorithms. This method can predict maximum number of cysteine functions. Moreover, for the first time, explicitly predicts thioether function. This tool was used to elucidate the cysteine functions on domains of unknown functions (DUFs) belonging to cytochrome C oxidase subunit-II (COX2) like transmembrane domains. Apart from the web-server, a standalone program is also available on GitHub (https://github.com/vam-sin/deepcys)
The assignment of protein secondary structure elements (SSEs) underpins the structural analysis and prediction. The backbone of a protein could be adequately represented using a pc-polyline that passes through the centers of its peptide planes. One salient feature of pc-polyline representation is that the secondary structure of a protein becomes recognizable in a matrix whose elements are the pairwise distances between two peptide plane centers. Thus a pc-polyline could in turn be used to assign SSEs. Using convolutional neuron network (CNN) here we confirm that a pc-polyline indeed contains enough information for it to be used for the accurate assignments of six types of secondary structure elements: α-helix, β-sheet, β-bulge, 3 10 -helix, turn and loop. The applications to three large data sets show that the assignments made by our CNN-based P2PSSE program agree very well with those by DSSP , STRIDE and quite well with those by five other programs. The analyses of the assignments by P2PSSE and those by other programs raise some general questions about the characterizations of protein secondary structure. In particular the analyses illustrate the difficulty with giving a quantitative and consistent definition for each of the six SSE types especially for 3_10 -helix, β-bulge, turn or loop in terms of either backbone H-bond patterns, or backbone dihedral angles, or Cα -polylines or pc-polylines. The difficulty suggests that the SSE space though being dominated by the regions for the six SSE types is to a certain degree continuous.
The mitochondrial F1FO-ATPase in the presence of the natural cofactor Mg2+ acts as the enzyme of life by synthesizing ATP, but it can also hydrolyze ATP to pump H+. Interestingly, Mg2+ can be replaced by Ca2+, but only to sustain ATP hydrolysis and not ATP synthesis. When Ca2+ inserts in F1, the torque generation built by the chemomechanical coupling between F1 and the rotating central stalk was reported as unable to drive the transmembrane H+ flux within FO. However, the failed H+ translocation is not consistent with the oligomycin-sensitivity of the Ca2+-dependent F1FO-ATP(hydrol)ase. New enzyme roles in mitochondrial energy transduction are suggested by recent advances. Accordingly, the structural F1FO-ATPase distortion driven by ATP hydrolysis sustained by Ca2+ is consistent with the permeability transition pore signal propagation pathway. The Ca2+-activated F1FO-ATPase, by forming the pore, may contribute to dissipate the transmembrane H+ gradient created by the same enzyme complex.
Human histone H1 subtypes interaction networks was constructed to show a spectrum of their activities realized through the protein-protein interactions. Histone H1 subtypes participate in over half a thousand interactions with nuclear and cytosolic proteins engaged in the enzymatic activity and binding of nucleic acids and proteins. Small scale networks created by H1 subtypes are similar in their topological parameters (p > 0.05) but hub proteins of the networks formed with subtype H1.1 and H1.4 differ from those of subtype H1.3 and H1.5 in the closeness centrality, clustering coefficient and neighborhood connectivity (p < 0.05). Molecular function and biological process of the networks hubs is related to RNA binding and ribosome biogenesis (subtype H1.1 and H1.4), cell cycle and cell division (subtype H1.3 and H1.5) and protein ubiquitination and degradation (subtype H1.2). Such a disparity between H1 subtypes is also manifested by enriched GO terms of their interacting proteins. The residue propensity and secondary structures of interacting surfaces as well as a value of equilibrium dissociation constant indicate that a type of H1 subtypes interactions is transient in term of the stability and medium-strong in relation to the strength of binding. Histone H1 subtypes bind interacting partners in the intrinsic disorder–dependent mode, according to the coupled folding and binding and mutual synergistic folding mechanism. These results evidence that multifunctional H1 subtypes operate via protein interactions in the networks of crucial cellular processes and, therefore, confirm a new histone H1 paradigm relating to its functioning in the protein-protein interaction networks.
Normal Mode Analysis is a fast and inexpensive approach that is largely used to gain insight into functional protein motions, and more recently to create conformations for further computational studies. However, when the protein structure is unknown, the use of computational models is necessary. Here, we analyze the capacity of normal mode analysis in internal coordinate space to predict protein motion, its intrinsic flexibility and atomic displacements, using protein models instead of native structures, and the possibility to use it for model refinement. Our results show that normal mode analysis is quite insensitive to modelling errors, but that calculations are strictly reliable only for very accurate models. Our study also suggests that internal normal mode analysis is a more suitable tool for the improvement of structural models, and for integrating them with experimental data or in other computational techniques, such as protein docking or more refined molecular dynamics simulations.
Deep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling (FM) targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to predictive improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.
NADPH:protochlorophyllide (Pchlide) oxidoreductase (POR) is a key enzyme of chlorophyll biosynthesis in angiosperms. It is one of few known photoenzymes, which catalyzes the light-activated trans-reduction of the C17-C18 double bond of Pchlide’s porphyrin ring. Due to the light requirement, dark-grown angiosperms cannot synthesize chlorophyll. No crystal structure of POR is available, so to improve understanding of the protein’s three-dimensional structure, its dimerization, and binding of ligands (both the cofactor NADPH and substrate Pchlide), we computationally investigated the sequence and structural relationships among homologous proteins identified through database searches. The results indicate that α4 and α7 helices of monomers form the interface of POR dimers. On the basis of conserved residues, we predicted 11 functionally important amino acids that play important roles in POR binding to NADPH. Structural comparison of available crystal structures revealed that they participate in formation of binding pockets that accommodate the Pchlide ligand, and that five atoms of the closed tetrapyrrole are involved in non-bonding interactions. However, we detected no clear pattern in the physico-chemical characteristics of the amino acids they interact with. Thus, we hypothesize that interactions of these atoms in the Pchlide porphyrin ring are important to hold the ligand within the POR binding site. Analysis of Pchlide binding in POR by molecular docking and PELE simulations revealed that the orientation of the nicotinamide group is important for Pchlide binding. These findings highlight the complexity of interactions of porphyrin-containing ligands with proteins, and we suggest that fit-inducing processes play important roles in POR-Pchlide interactions.
One way in which trichocyte keratin intermediate filament proteins (keratins) and keratin associated proteins (KAPs) differ from their epithelial equivalents is in their higher levels of cysteine residues. Interactions between these cysteine residues within a mammalian fiber, and the putative regular organization of interactions (i.e., types of disulfide bond) are likely important for defining fiber mechanical properties, and thus biological functionality of hairs. Here we extend a previous study of cysteine accessibility under different levels of exposure to reducing compounds to explore a finer set of levels associated with interactions between keratins and KAPs. We found that most of the cysteines in the KAPs were close to either the N- or C- terminal domains of these proteins. The most accessible cysteines in keratins were present in the head or tail domains indicating their function in readily forming intermolecular bonds with KAPs. Some of the more buried cysteines in keratins were discovered either close to or within the rod region in positions previously identified in human epithelial keratins as being involved in crosslinking between the heterodimers of the tetramer. Our present study therefore provides a deeper understanding of the accessibility of disulfides especially in keratins and thus proves that there is some specificity to the disulfide bond interactions leading to these intermolecular bonds stabilizing the fiber structure.
We have investigated the pressure- and temperature-induced conformational changes associated with the low complexity domain of hnRNP A1, an RNA-binding protein able to phase separate in response to cellular stress. Solution NMR spectra of the hnRNP A1 low-complexity domain fused with protein-G B1 domain were collected from 1 to 2,500 bar and from 268 K to 290 K. While the GB1 domain shows the typical pressure-induced and cold temperature-induced unfolding expected for small globular domains, the low-complexity domain of hnRNP A1 exhibits unusual pressure and temperature dependences. We observed that the low-complexity domain is pressure sensitive, undergoing a major conformational transition within the prescribed pressure range. Remarkably, this transition has the inverse temperature dependence of a typical folding-unfolding transition. Our results suggest the presence of a low-lying extended, and fully solvated state(s) of the low-complexity domain that may play a role in phase separation. This study highlights the exquisite sensitivity of solution NMR spectroscopy to observe subtle conformational changes and illustrates how pressure perturbation can be used to determine the properties of metastable conformational ensembles.
The loops of modular polyketide synthases (PKSs) serve diverse functions but are largely uncharacterized. They frequently contain amino acid repeats resulting from genetic events such as slipped-strand mispairing. Determining the tolerance of loops to amino acid changes would aid in understanding and engineering these multidomain molecule factories. Here, tandem repeats in the DNA encoding 949 modules within 129 cis-acyltransferase PKSs were catalogued, and the locations of the corresponding amino acids within the module were identified. The most frequently inserted interdomain loop corresponds with the updated module boundary immediately downstream of the ketosynthase (KS), while the loops bordering the dehydratase (DH) were nearly intolerant to such insertions. An analysis of the loops bordering the acyl carrier protein (ACP) reveals they are relatively short (14±6 residues), that they resist large increases in length, and that ACP may rely on acyltransferase (AT) accessing a conformation like that observed through electron microscopy of the pikromycin PKS. From the 949 modules, no repetitive sequence loop insertions are located within ACP, and only 2 reside within KS, indicating the sensitivity of these domains to alteration.
Polyene polyketides amphotericin B (AMB) and nystatin (NYS) are important antifungal drugs. Thioesterases (TEs), located at the last module of PKS, control the release of polyketides by cyclization or hydrolysis. Intrigued by the tiny structural difference between AMB and NYS, as well as the high sequence identity between AMB TE and NYS TE, we constructed four systems to study the structural characteristics, catalytic mechanism, and product release of AMB TE and NYS TE with combined MD simulations and QM/MM calculations. The results indicated that compared with AMB TE, NYS TE shows higher specificity on its natural substrate and R26 as well as D186 were proposed to a key role in substrate recognition. The energy barrier of macrocyclization in AMB-TE-Amb and AMB-TE-Nys systems were calculated to be 14.0 and 22.7 kcal/mol, while in NYS-TE-Nys and NYS-TE-Amb systems, their energy barriers were 17.5 and 25.7 kcal/mol, suggesting the cyclization with their natural substrates were more favorable than that with exchanged substrates. At last, the binding free energy obtained with the MM-PBSA.py program suggested that it was easier for natural products to leave TE enzymes after cyclization. And key residues to the departure of polyketide product from the active site were highlighted. We provided a catalytic overview of AMB TE and NYS TE including substrate recognition, catalytic mechanism and product release. These will improve the comprehension of polyene polyketide TEs and benefit for broadening the substrate flexibility of polyketide TEs.
As a key cellular sensor, the TRPV1 channel undergoes a gating transition from a closed state to an open state in response to many physical and chemical stimuli. This transition is regulated by small-molecule ligands including lipids and various agonists/antagonists, but the underlying molecular mechanisms remain obscure. Thanks to recent revolution in cryo-electron microscopy, a growing list of new structures of TRPV1 and other TRPV channels have been solved in complex with various ligands including lipids. Toward elucidating how ligand binding correlates with TRPV1 gating, we have performed extensive molecular dynamics simulations (with cumulative time of 20 μs), starting from high-resolution structures of TRPV1 in both the closed and open states. By comparing between the open and closed state ensembles, we have identified state-dependent binding sites for small-molecule ligands in general and lipids in particular. We further use machine learning to predict top ligand-binding sites as important features to classify the closed vs open states. The predicted binding sites are thoroughly validated by matching homologous sites in all structures of TRPV channels bound to lipids and other ligands, and with previous functional/mutational studies of ligand binding in TRPV1. Taken together, this study has integrated rich structural, dynamic, and functional data to inform future design of small-molecular drugs targeting TRPV1.
Inorganic pyrophosphatases (PPases) catalyze the hydrolysis of pyrophosphate to phosphates. PPases play essential roles in growth and development, and are found in all kingdoms of life. Human possess two PPases, PPA1 and PPA2. PPA1 is present in all tissues, acting largely as a housekeeping enzyme. Besides pyrophosphate hydrolysis, PPA1 can also directly dephosphorylate phosphorylated JNK1. Upregulated expression of PPA1 has been linked to many human malignant tumors. PPA1 knockdown induces apoptosis and decreases proliferation. PPA1 is emerging as a potential prognostic biomarker and target for anti-cancer drug development. In spite of the biological and physiopathological importance of PPA1, there is no detailed study on the structure and catalytic mechanisms of mammalian origin PPases. Here we report the crystal structure of human PPA1 at a resolution of 2.4 Å. We also carried out modeling studies of PPA1 in complex with JNK1 derived phosphor-peptides. The monomeric protein fold of PPA1 is similar to those found in other family I PPases. PPA1 forms a dimeric structure that should be conserved in animal and fungal PPases. Analysis of the PPA1 structure and comparison with available structures of PPases from lower organisms suggest that PPA1 has a largely pre-organized and relatively rigid active site for pyrophosphate hydrolysis. Results from the modeling study indicate the active site of PPA1 has the potential to accommodate double-phosphorylated peptides derived from JNK1. In short, results from the study provides new insights into the mechanisms of human PPA1 and basis for structure-based anti-cancer drug developments using PPA1 as the target.
In vertebrates, the mineralocorticoid receptor (MR) is a steroid-activated nuclear receptor (NR) that plays essential roles in water-electrolyte balance and blood pressure homeostasis. It belongs to the group of oxo-steroidian NRs, together with the glucocorticoid (GR), progesterone (PR), and androgen (AR) receptors. Classically, these oxo-steroidian NRs homodimerize and bind to specific genomic sequences to activate gene expression. NRs are multi-domain proteins, and dimerization is mediated by both the DNA (DBD) and ligand binding (LBD) domains, with the latter thought to provide the largest dimerization interface. However, at the structural level, the LBD dimerization of oxo-steroidian receptors has remained largely a matter of debate. This is linked to the receptor refractory expression, purification and crystallization. As a result, there is currently no consensus on a common homodimer assembly across the 4 receptors, i.e. GR, PR, AR and MR, despite their sequence homology. Examining the available MR LBD crystals and using widely plebiscited tools such as PISA, PRISM and EPPIC, and the MM/PBSA method, we have determined that an interface mediated by the helices H9 and H10 of the LBD as well as by the F domain presents the features of a biological protein-protein interaction surface. This interface which has been observed in both GR alpha and MR crystals, distinguished itself among other contacts and provided for the first time a homodimer architecture that is common to both oxo-steroidian receptors.