The c subunits, which constitutes the c-ring apparatus of the F 1F O-ATPase, could be the main components of the mitochondrial permeability transition pore (mPTP). The well-known modulator of the mPTP formation and opening is the cyclophilin D (CyPD), a peptidyl-prolyl cis- trans isomerase. On the loop, which connects the two hairpin α-helix of c subunit, is present the unique proline residue (Pro 40) that could be a biological target of CyPD. Indeed, the proline cis- trans isomerization might provide the switch that interconverts the open/closed states of the pore by pulling out the c-ring lipid plug.
The heterologous overexpression states of prion proteins play a critical role in understanding the mechanisms of prion-related diseases. We report herein the identification of soluble monomer and complex states for a bakers’ yeast prion, Sup35, when expressed in E. coli. Two peaks are apparent with the elution of His-tagged Sup35 by imidazole from a Ni 2+ affinity column. Peak I contains Sup35 in both monomer and aggregated states. Sup35 aggregate is abbreviated as C-aggregate and includes a non-fibril complex comprising Sup35 aggregate-HSP90-Dna K, ATP synthase β unit (chain D), 30S ribosome subunit, and Omp F. The purified monomer and C-aggregate can remain stable for an extended period of time. Peak II contains Sup35 also in both monomer and aggregated (abbreviated as S-aggregate) states, but the aggregated states are caused by the formation of inter-Sup35 disulfide bonds. This study demonstrates that further assembly of Sup35 non-fibril C-aggregate can be interrupted by the chaperone repertoire system in E. coli.
Petroleum based plastics are durable and accumulate in all ecological niches. Knowledge on enzymatic degradation is sparse. Today, less than 50 verified plastics-active enzymes are known. First examples of enzymes acting on the polymers polyethylene terephthalate (PET) and polyurethane (PUR) have been reported together with a detailed biochemical and structural description. Further, very few polyamide (PA) oligomer active enzymes are known. In this paper, the current known enzymes acting on the synthetic polymers PET and PUR are briefly summarized, their published activity data were collected and integrated into a comprehensive open access database. The Plastics-Active Enzymes Database (PAZy) represents an inventory of known and experimentally verified plastics-active enzymes. Almost 3000 homologues of PET-active enzymes were identified by profile hidden Markov models. Over 2000 homologues of PUR-active enzymes were identified by BLAST. Based on multiple sequence alignments, conservation analysis identified the most conserved amino acids, and sequence motifs for PET- and PUR-active enzymes were derived.
Functional regulation via conformational dynamics is well known in structured proteins, but less well characterized in intrinsically disordered proteins and their complexes. Using NMR spectroscopy we have identified a dynamic regulatory mechanism in the human insulin-like growth factor (IGF) system involving the central, intrinsically disordered linker domain of human IGF-binding protein-2 ( hIGFBP2). The bioavailability of IGFs is regulated by the proteolysis of IGF-binding proteins. In the case of hIGFBP2, the linker domain (L- hIGFBP2) retains its intrinsic disorder upon binding IGF-1 but its dynamics are significantly altered, both in the IGF binding region and distantly located protease cleavage sites. The increase in flexibility of the linker domain upon IGF-1 binding may explain the IGF-dependent modulation of proteolysis of IGFBP2 in this domain. As IGF homeostasis is important for cell growth and function, and its dysregulation is a key contributor to several cancers, our findings open up new avenues for the design of IGFBP analogs inhibiting IGF-dependent tumors.
Nucleotides metabolism is a fundamental process in all organisms. Two families of nucleoside phosphorylases (NP) that catalyze the phosphorolytic cleavage of the glycosidic bond in nucleosides have been found, including the trimeric or hexameric NP-I and dimeric NP-II family enzymes. Recently studies revealed another class of NP protein in E. coli named Pyrimidine/purine nucleoside phosphorylase (ppnP), which can catalyze the phosphorolysis of diverse nucleosides and yield D-ribose 1-phosphate and the respective free bases. Here, we solve the crystal structures of ppnP from E. coli and the other three species. Our studies revealed that the structure of ppnP belongs to the Rlmc-like cupin fold and showed as a rigid dimeric conformation. Detail analysis revealed a potential nucleoside binding pocket full of hydrophobic residues. And the residues involved in the dimer and pocket formation are all well conserved in bacteria. Since the cupin fold is a large superfamily in the biosynthesis of natural products, our studies provide the structural basis for understanding and the directed evolution of NP proteins.
The structure of a protein plays a pivotal role in determining its function. Often, the protein surface’s shape and curvature dictate its nature of interaction with other proteins and biomolecules. However, marked by corrugations and roughness, a protein’s surface representation poses significant challenges for its curvature-based characterization. In the present study, we employ unsupervised machine learning to segment the protein surface into patches. To measure the surface curvature of a patch, we present an algebraic sphere fitting method that is fast, accurate, and robust. Moreover, we use local curvatures to show the existence of “shape complementarity” in protein-protein, antigen-antibody, and protein-ligand interfaces. We believe that the current approach could help understand the relationship between protein structure and its biological function and can be used to find binding partners of a given protein.
Multimeric protein complexes are molecular apparatuses to regulate biological systems and often determine their fate. Among proteins forming such molecular assemblies, amyloid proteins have drawn attention over a half-century since amyloid fibril formation of these proteins is supposed to be a common pathogenic cause for neurodegenerative diseases. This process is triggered by the accumulation of fibril-like aggregates, while the microscopic mechanisms are mostly elusive due to technical limitation of experimental methodologies in individually observing each of diverse aggregate species in the aqueous solution. We then addressed this problem by employing atomistic molecular dynamics simulations for the paradigmatic amyloid protein, amyloid-β (1-42) (Aβ 42). Seven different dimeric forms of oligomeric Aβ 42 fibril-like aggregate in aqueous solution, ranging from tetramer to decamer, were considered. We found additive effects of the size of these fibril-like aggregates on their thermodynamic stability and have clarified kinetic suppression of protomer-protomer dissociation reactions at and beyond the point of pentamer dimer formation. This observation was obtained from the specific combination of the Aβ 42 protomer structure and the physicochemical condition that we here examined, while it is worthwhile to recall that several amyloid fibrils take dimeric forms of their protomers. We could thus conclude that the stable formation of fibril-like protomer dimer should be involved in a turning point where rapid growth of amyloid fibrils is triggered.
RNA binding proteins (RBPs) regulate many important cellular processes through their interactions with RNA molecules. RBPs are critical for post-transcriptional mechanisms keeping gene regulation in a fine equilibrium. Conversely, dysregulation of RBPs and RNA metabolism pathways is an established hallmark of tumorigenesis. Human nucleolin (NCL) is a multifunctional RBP that interacts with different types of RNA molecules, in part through its four RNA binding domains (RBDs). Particularly, NCL interacts directly with microRNAs (miRNAs) and is involved in their aberrant processing linked with many cancers, including breast cancer. Nonetheless, molecular details of the NCL-miRNA interaction remain obscure. In this study, we used an in silico approach to characterize how NCL targets miRNAs and whether this specificity is imposed by a definite RBD-interface. Here, we present structural models of NCL-RBDs and miRNAs, as well as predict scenarios of NCL- miRNA interactions generated using docking algorithms. Our study suggests a predominant role of NCL RBDs 3 and 4 (RBD3-4) in miRNA binding. We provide detailed analyses of specific motifs/residues at the NCL-substrate interface in both these RBDs and miRNAs. Finally, we propose that the evolutionary emergence of more than two RBDs in NCL in higher organisms coincides with its additional role/s in miRNA processing. Our study shows that RBD3-4 display sequence/structural determinants to specifically recognize miRNA precursor molecules. Moreover, the insights from this study can ultimately support the design of novel antineoplastic drugs aimed at regulating NCL-dependent biological pathways with a causal role in tumorigenesis.
Lipid transporters play an important role in most if not all organisms, ranging from bacteria to humans. For example, in Mycobacterium tuberculosis, the trehalose monomycolate transporter MmpL3 is involved in cell wall biosynthesis, while in humans, cholesterol transporters are involved in normal cell function as well as in disease. Here, using structural and bioinformatics information, we propose that there are proteins that also contain “MmpL3-like” (MMPL) transmembrane (TM) domains in many protozoa, including Trypanosoma cruzi, as well as in the bacterium Staphylococcus aureus, where the fatty acid transporter FarE has the same set of “active-site” residues as those found in the mycobacterial MmpL3s, and in T. cruzi. We also show that there are strong sequence and predicted structural similarities between the TM proton-translocation domain seen in the X-ray structures of mycobacterial MmpL3s and several human as well as fungal lipid transporters, leading to the proposal that there are similar proteins in apicomplexan parasites, and in plants. The animal, fungal, apicomplexan and plant proteins have larger extra-membrane domains than are found in the bacterial MmpL3, but they have a similar TM domain architecture, with the introduction of a (catalytically essential) Phe>His residue change, and a Ser/Thr H-bond network, involved in H +-transport. Overall, the results are of interest since they show that MMPL-family proteins are present in essentially all life-forms: archaea, bacteria, protozoa, fungi, plants and animals and, where known, they are involved in “lipid” (glycolipid, phospholipid, sphingolipid, fatty acid, cholesterol, ergosterol) transport, powered by transmembrane molecular pumps having similar structures.
In Azospirillum brasilense, an extra-cytoplasmic function sigma factor (RpoE10) shows the characteristic 119 amino acid long C-terminal extension found in ECF41-type sigma factors, which possesses three conserved motifs (WLPEP, DGGGR, and NPDKV), one in the linker region between the sigma 2 and sigma 4, and the other two in the SnoaL_2 domain of the C-terminal extension. Here, we have described the role of the two conserved motifs in the SnoaL_2 domain of RpoE10 in the inhibition and activation of its activity, respectively. Truncation of the distal part of the C-terminal sequence of the RpoE10 (including NPDKV but excluding the DGGGR motif) results in its promoter’s activation suggesting autoregulation. Further truncation of the C-terminal sequence up to its proximal part, including NPDKV and DGGGR motif, abolished promoter activation. Replacement of NPDKV motif with NAAAV in RpoE10 increased its ability to activate its promoter, whereas replacement of DGGGR motif led to reduced promoter activation. We have explored the dynamic modulation of sigma2 – sigma4 domains and the relevant molecular interactions mediated by the two conserved motifs of the SnoaL2 domain using molecular dynamics simulation. The analysis enabled us to explain that the NPDKV motif located distally in the C-terminus negatively impacts transcriptional activation. In contrast, the DGGGR motif found proximally of the C-terminal extension is required to activate RpoE1
Chaperonin Hsp60, as a protein found in all organisms, is of great interest in medicine, since it is present in many tissues and can be used both as a drug and as an object of targeted therapy. Hence, Hsp60 deserves a fundamental comparative analysis to assess its evolutionary characteristics. It was found that the percent identity of Hsp60 amino acid sequences both within and between phyla was not high enough to identify Hsp60s as highly conserved proteins. In turn, their amino acid composition remained relatively constant. At the same time, the analysis of the nucleotide sequences showed that GC content in the Hsp60 genes was comparable to or greater than the genomic values, which may indicate a high resistance to mutations due to tight control of the nucleotide composition by DNA repair systems. Natural selection plays a dominant role in the evolution of Hsp60 genes. The degree of mutational pressure affecting the Hsp60 genes is quite low, and its direction does not depend on taxonomy. Interestingly, for the Hsp60 genes from Chordata, Arthropoda, and Proteobacteria the exact direction of mutational pressure could not be determined. However, upon further division into classes, it was found that the direction of the mutational pressure for Hsp60 genes from Fish differs from that for other chordates. The direction of the mutational pressure affects the synonymous codon usage bias. The number of high and low represented codons increases with increasing GC content, which can improve codon usage.
Scoring docking solutions is a difficult task, and many methods have been developed for this purpose. In docking, only a handful of the hundreds of thousands of models generated by docking algorithms are acceptable, causing difficulties when developing scoring functions. Today’s best scoring functions can significantly increase the number of top-ranked models but still fails for most targets. Here, we examine the possibility of utilising predicted residues on a protein-protein interface to score docking models generated during the scan stage of a docking algorithm. Many methods have been developed to infer the portions of a protein surface that interact with another protein, but most have not been benchmarked using docking algorithms. Different interface prediction methods are systematically tested for scoring >300.000 low-resolution rigid-body template free docking decoys. Overall we find that BIPSPI is the best method to identify interface amino acids and score docking solutions. Further, using BIPSPI provides better docking results than state of the art scoring functions, with >12% of first ranked docking models being acceptable. Additional experiments indicated precision as a high-importance metric when estimating interface prediction quality, focusing on docking constraints production. We also discussed several limitations for the adoption of interface predictions as constraints in a docking protocol.
Amyloid beta (Aβ) peptides, a major contributor to Alzheimers disease, occur in differing lengths, each of which forms a multitude of assembly types. The most toxic soluble oligomers are formed by Aβ42; some of which have antiparallel β-sheets. Previously, our group proposed molecular models of Aβ42 hexamers in which the C-terminus third of the peptide (S3) forms an antiparallel 6-stranded β-barrel that is surrounded by an antiparallel barrel formed by the more polar N-terminus (S1) and middle (S2) portions. These hexamers were proposed to act as seeds from which dodecamers, octadecamers, both smooth and beaded annular protofibrils, and transmembrane channels form. Since then, numerous aspects of our models have been supported by experimental findings. Recently, NMR-based structures have been proposed for Aβ42 tetramers and octamers, and NMR studies have been reported for oligomers composed of ~ 32 monomers. Here we propose a range of concentric β-barrel models and compare their dimensions to image-averaged electron micrographs of both beaded annular protofibrils (bAPFs) and smooth annular protofibrils (sAPFs) of Aβ42. The smaller oligomers have 6, 8, 12, 16, and 18 monomers. These beads string together to form necklace-like bAPFs. These gradually morph into sAPFs in which a S3 β-barrel is shielded on one or both sides by β-barrels formed from S1 and S2 segments.
G-protein-coupled receptors (GPCRs) are the largest family of human membrane proteins and represent the primary targets of about one third of currently marketed drugs. Despite the critical importance, experimental structures have been determined for only a limited portion of GPCRs and functional mechanisms of GPCRs remain poorly understood. Here, we have constructed novel sequence coevolutionary models of the A and B classes of GPCRs and compared them with residue contact frequency maps generated with available experimental structures. Significant portions of structural residue contacts were successfully detected in the sequence-based covariational models. “Exception” residue contacts predicted from sequence coevolutionary models but not available structures added missing links that were important for GPCR activation and allosteric modulation. Moreover, we identified distinct residue contacts involving different sets of functional motifs for GPCR activation, such as the Na+ pocket, CWxP, DRY, PIF and NPxxY motifs in the class A and the HETx and PxxG motifs in the class B. Finally, we systematically uncovered critical residue contacts tuned by allosteric modulation in the two classes of GPCRs, including those from the activation motifs and particularly the extracellular and intracellular loops in class A GPCRs. These findings provide a promising framework for rational design of ligands to regulate GPCR activation and allosteric modulation.
NMR studies can provide unique information about protein conformations in solution. In CASP14, three reference structures provided by solution NMR methods were available (T1027, T1029, and T1055), as well as a fourth data set of NMR-derived contacts for a integral membrane protein (T1088). For the three targets with NMR-based structures, the best prediction results ranged from very good (GDT_TS = 0.90, for T1055) to poor (GDT_TS = 0.47, for T1029). We explored the basis of these results by comparing all CASP14 prediction models against experimental NMR data. For T1027, the NMR data reveal extensive internal dynamics, presenting a unique challenge for protein structure prediction. The analysis of T1029 motivated exploration of a novel method of “inverse structure determination”, in which an AF2 model was used to guide NMR data analysis. NMR data provided to CASP predictor groups for target T1088, a 238-residue integral membrane porin, was also used to assess several NMR-assisted prediction methods. Most groups involved in this exercise generated similar beta-barrel models, with good agreement with the experimental data. However, as was also observed in CASP13, some pure prediction groups that did not use the NMR data generated structures for T1088 that better fit the NMR data than the models generated using these experimental data. These results demonstrate the remarkable power of modern methods to predict structures of proteins with accuracies rivaling solution NMR structures, and that it is now possible to reliably use prediction models to guide and complement experimental NMR data analysis.
Protein allergens is a health risk for consumption of soybeans. To understand allerginicity mechanism, T cell epitopes of 7 soybean allergens were predicted and screened by abilities to induce cytokine interleukin 4. The relationships among amino acid composition, properties, allergenicity and pepsin hydrolysis sites were analyzed. Among the 138 T cell epitopes identified, YIKDVFRVIPSEVLS, KDVFRVIPSEVLSNS, DVFRVIPSEVLSNSY of Gly m 6.0501 (P04347), and AKADALFKAIEAYLL, ADALFKAIEAYLLAH of Gly m 4.0101 (P26987) were the most possible epitope candidates. In T cell epitopes pattern, the frequencies of amino acids Q, D, E, P and G decreased, while F, I, N, V, K and H increased. Hydrophobic residues at positions p1 and p2 and positively charged residues in positions p13 might contribute to allergenicity. Most of epitopes could be hydrolyzed by pepsin into small polypeptides within 12 residues length, and the anti-digestive epitope regions contained I, V, S, N, and Q residues. T cell epitopes EEQRQQEGVIVELSK from Gly m 5.03 (P25974) showed resistantence to pepsin hydrolysis and would cause a higher Th2 cell response. This research provides basis for the development of hypoallergenic soybean products in the soybean industry as well as for the immunotherapy design for protein allergy.
Predicting the quaternary structure of protein complex is an important problem. Inter-chain residue-residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter-chain contacts. Here, we introduce a gradient descent optimization algorithm (GD) to build quaternary structures of protein dimers utilizing inter-chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true or predicted contacts. GD consistently performs better than a simulated annealing method and a Markov Chain Monte Carlo simulation method. Using true inter-chain contacts as input, GD can reconstruct high-quality structural models for homodimers and heterodimers with average TM-score ranging from 0.92 to 0.99 and average interface root mean square distance (I-RMSD) from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter-chain contacts as input, the average TM-score of the structural models built by GD is 0.76. For 46% of the homodimers, high-quality structural models with TM-score >= 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. If the precision or recall of predicted contacts is >20%, GD can reconstruct good models for most homodimers, indicating only a moderate precision or recall of inter-chain contact prediction is needed to build good structural models for most homodimers. Moreover, the accuracy of reconstructed models positively correlates with the contact density in dimers.
CASP (Critical Assessment of Structure prediction) conducts community experiments to determine the state of the art in computing protein structure from amino acid sequence. The process relies on the experimental community providing information about not yet public or about to be solved structures, for use as targets. For some targets, the experimental structure is not solved in time for use in CASP. Calculated structure accuracy improved dramatically in this round, implying that models should now be much more useful for resolving many sorts of experimental difficulty. To test this, selected models for seven unsolved targets were provided to the experimental groups. These models were from the AlphaFold2 group, who overall submitted the most accurate predictions in CASP14. Four targets were solved with the aid of the models, and, additionally, the structure of an already solved target was improved. An a-posteriori analysis showed that in some cases models from other groups would also be effective. This paper provides accounts of the successful application of models to structure determination, including molecular replacement for X-ray crystallography, backbone tracing and sequence positioning in a Cryo-EM structure, and correction of local features. The results suggest that in future there will be greatly increased synergy between computational and experimental approaches to structure determination.