The loops of modular polyketide synthases (PKSs) serve diverse functions but are largely uncharacterized. They frequently contain amino acid repeats resulting from genetic events such as slipped-strand mispairing. Determining the tolerance of loops to amino acid changes would aid in understanding and engineering these multidomain molecule factories. Here, tandem repeats in the DNA encoding 949 modules within 129 cis-acyltransferase PKSs were catalogued, and the locations of the corresponding amino acids within the module were identified. The most frequently inserted interdomain loop corresponds with the updated module boundary immediately downstream of the ketosynthase (KS), while the loops bordering the dehydratase (DH) were nearly intolerant to such insertions. An analysis of the loops bordering the acyl carrier protein (ACP) reveals they are relatively short (14±6 residues), that they resist large increases in length, and that ACP may rely on acyltransferase (AT) accessing a conformation like that observed through electron microscopy of the pikromycin PKS. From the 949 modules, no repetitive sequence loop insertions are located within ACP, and only 2 reside within KS, indicating the sensitivity of these domains to alteration.
Polyene polyketides amphotericin B (AMB) and nystatin (NYS) are important antifungal drugs. Thioesterases (TEs), located at the last module of PKS, control the release of polyketides by cyclization or hydrolysis. Intrigued by the tiny structural difference between AMB and NYS, as well as the high sequence identity between AMB TE and NYS TE, we constructed four systems to study the structural characteristics, catalytic mechanism, and product release of AMB TE and NYS TE with combined MD simulations and QM/MM calculations. The results indicated that compared with AMB TE, NYS TE shows higher specificity on its natural substrate and R26 as well as D186 were proposed to a key role in substrate recognition. The energy barrier of macrocyclization in AMB-TE-Amb and AMB-TE-Nys systems were calculated to be 14.0 and 22.7 kcal/mol, while in NYS-TE-Nys and NYS-TE-Amb systems, their energy barriers were 17.5 and 25.7 kcal/mol, suggesting the cyclization with their natural substrates were more favorable than that with exchanged substrates. At last, the binding free energy obtained with the MM-PBSA.py program suggested that it was easier for natural products to leave TE enzymes after cyclization. And key residues to the departure of polyketide product from the active site were highlighted. We provided a catalytic overview of AMB TE and NYS TE including substrate recognition, catalytic mechanism and product release. These will improve the comprehension of polyene polyketide TEs and benefit for broadening the substrate flexibility of polyketide TEs.
As a key cellular sensor, the TRPV1 channel undergoes a gating transition from a closed state to an open state in response to many physical and chemical stimuli. This transition is regulated by small-molecule ligands including lipids and various agonists/antagonists, but the underlying molecular mechanisms remain obscure. Thanks to recent revolution in cryo-electron microscopy, a growing list of new structures of TRPV1 and other TRPV channels have been solved in complex with various ligands including lipids. Toward elucidating how ligand binding correlates with TRPV1 gating, we have performed extensive molecular dynamics simulations (with cumulative time of 20 μs), starting from high-resolution structures of TRPV1 in both the closed and open states. By comparing between the open and closed state ensembles, we have identified state-dependent binding sites for small-molecule ligands in general and lipids in particular. We further use machine learning to predict top ligand-binding sites as important features to classify the closed vs open states. The predicted binding sites are thoroughly validated by matching homologous sites in all structures of TRPV channels bound to lipids and other ligands, and with previous functional/mutational studies of ligand binding in TRPV1. Taken together, this study has integrated rich structural, dynamic, and functional data to inform future design of small-molecular drugs targeting TRPV1.
Inorganic pyrophosphatases (PPases) catalyze the hydrolysis of pyrophosphate to phosphates. PPases play essential roles in growth and development, and are found in all kingdoms of life. Human possess two PPases, PPA1 and PPA2. PPA1 is present in all tissues, acting largely as a housekeeping enzyme. Besides pyrophosphate hydrolysis, PPA1 can also directly dephosphorylate phosphorylated JNK1. Upregulated expression of PPA1 has been linked to many human malignant tumors. PPA1 knockdown induces apoptosis and decreases proliferation. PPA1 is emerging as a potential prognostic biomarker and target for anti-cancer drug development. In spite of the biological and physiopathological importance of PPA1, there is no detailed study on the structure and catalytic mechanisms of mammalian origin PPases. Here we report the crystal structure of human PPA1 at a resolution of 2.4 Å. We also carried out modeling studies of PPA1 in complex with JNK1 derived phosphor-peptides. The monomeric protein fold of PPA1 is similar to those found in other family I PPases. PPA1 forms a dimeric structure that should be conserved in animal and fungal PPases. Analysis of the PPA1 structure and comparison with available structures of PPases from lower organisms suggest that PPA1 has a largely pre-organized and relatively rigid active site for pyrophosphate hydrolysis. Results from the modeling study indicate the active site of PPA1 has the potential to accommodate double-phosphorylated peptides derived from JNK1. In short, results from the study provides new insights into the mechanisms of human PPA1 and basis for structure-based anti-cancer drug developments using PPA1 as the target.
In vertebrates, the mineralocorticoid receptor (MR) is a steroid-activated nuclear receptor (NR) that plays essential roles in water-electrolyte balance and blood pressure homeostasis. It belongs to the group of oxo-steroidian NRs, together with the glucocorticoid (GR), progesterone (PR), and androgen (AR) receptors. Classically, these oxo-steroidian NRs homodimerize and bind to specific genomic sequences to activate gene expression. NRs are multi-domain proteins, and dimerization is mediated by both the DNA (DBD) and ligand binding (LBD) domains, with the latter thought to provide the largest dimerization interface. However, at the structural level, the LBD dimerization of oxo-steroidian receptors has remained largely a matter of debate. This is linked to the receptor refractory expression, purification and crystallization. As a result, there is currently no consensus on a common homodimer assembly across the 4 receptors, i.e. GR, PR, AR and MR, despite their sequence homology. Examining the available MR LBD crystals and using widely plebiscited tools such as PISA, PRISM and EPPIC, and the MM/PBSA method, we have determined that an interface mediated by the helices H9 and H10 of the LBD as well as by the F domain presents the features of a biological protein-protein interaction surface. This interface which has been observed in both GR alpha and MR crystals, distinguished itself among other contacts and provided for the first time a homodimer architecture that is common to both oxo-steroidian receptors.
Molecular dynamics (MD) simulations are a popular method of studying protein structure and function, but are unable to reliably sample all relevant conformational space in reasonable computational timescales. A range of enhanced sampling methods are available that can improve conformational sampling, but these do not offer a complete solution. We present here a proof-of-principle method of combining MD simulation with machine learning to explore protein conformational space. An autoencoder is used to map snapshots from MD simulations onto the conformational landscape defined by a 2D-RMSD matrix, and we show that we can predict, with useful accuracy, conformations that are not present in the training data. This method offers a new approach to the prediction of new low energy/physically realistic structures of conformationally dynamic proteins and allows an alternative approach to enhanced sampling of MD simulations.
The Protein Data Bank (PDB) file format remains a popular format used and supported by many software to represent coordinates of macromolecular structures. It however suffers from drawbacks such as error-prone manual editing. Because of that, various software toolkits have been developed to facilitate its editing and manipulation, but, to date, there is no online tool available for this purpose. Here we present PDB-Tools Web, a flexible online service for manipulating PDB files. It offers a rich and user-friendly graphical user interface that allows users to mix-and-match more than 40 individual tools from the pdb-tools suite. Those can be combined in a few clicks to perform complex pipelines, which can be saved and uploaded. The resulting processed PDB files can be visualized online and downloaded. The web server is freely available at https://wenmr.science.uu.nl/pdbtools.
A novel virus, severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2), causing coronavirus disease 2019 (COVID-19) worldwide appeared in 2019. Currently, we do not have a medicament that treats the disease. One of the rea-sons for the absence of treatment is related to the scarcity of detailed scientific knowledge of the members of the Coro-naviridae family, including the Middle East Respiratory Syndrome Coronavirus (MERS-CoV). Structural studies of the MERS-CoV proteins in the current literature are extremely limited. We present here detailed characterization of the struc-tural properties of MERS-CoV macro domain in aqueous solution at the atomic level with dynamics. For this study, we conducted extensive replica exchange molecular dynamics simulations linked to a generative neural networks and we use the resulting trajectories for structural analysis. We perform structural clustering based on the radius of gyration and end-to-end distance of MERS-CoV macro domain in aqueous solution with dynamics at the atomic level. We also report and analyze the residue-level intrinsic disorder features, flexibility and secondary structure. Furthermore, we study the pro-pensities of this macro domain for protein-protein interactions and for the RNA and DNA binding. Results are in agree-ment with available nuclear magnetic resonance spectroscopy findings and present more detailed insights into the struc-tural properties of MERS CoV macro domain. Overall, this work further shows that neural networks can be used as an exploratory tool for the studies of CoV family molecular conformational space at the nano level.
To greatly expand the druggable genome, fast and accurate predictions of cryptic sites for small molecules binding in target proteins are in high demand. In this study, we have developed a fast and simple conformational sampling scheme guided by normal modes solved from the coarse-grained elastic models followed by atomistic backbone refinement and sidechain repacking. Despite the observations of complex and diverse conformational changes associated with ligand binding, we found that simply sampling along each of the lowest 30 modes is near optimal for adequately restructuring cryptic sites so they can be detected by existing pocket finding programs like fpocket and concavity. We further trained machine-learning protocols to optimize the combination of the sampling-enhanced pocket scores with other dynamic and conservation scores, which only slightly improved the performance. As assessed based on a training set of 84 known cryptic sites and a test set of 14 proteins, our method achieved high accuracy of prediction (with area under the receiver operating characteristic curve > 0.8) comparable to the CryptoSite server. Compared with CryptoSite and other methods based on extensive molecular dynamics simulation, our method is much faster (1-2 hours for an average-size protein) and simpler (using only pocket scores), so it is suitable for high-throughput processing of large datasets of protein structures at the genome scale.
The structure of heterotetrameric sarcosine oxidase (HSO) contains a highly complex system composed of a large cavity and tunnels, which are essential for the reaction and migration of the reactants, products, and intermediates. Previous molecular dynamics (MD) simulation of HSO has identified the regions containing the water channels from the density distribution of water. The simulation is consistent with the selective transport hypothesis of the migration of the iminium intermediate, 5-oxazolidinone (5-OXA), of the enzyme reaction whereby tunnel T3 is the exit pathway of 5-OXA. In the present study, the potential of mean force (PMF) for the transport of 5-OXA through tunnels T1, T2, and T3 was calculated using umbrella sampling (US) MD simulations and the weighted histogram analysis method. The maximum errors of the calculated PMF were estimated by repeating the US simulations using different sets of initial positions. The PMF profiles for the three tunnels support the notion that tunnel T3 is the exit pathway of 5-OXA and that 5-OXA tends to stay at the middle of the tunnel. The PMF profile for the transport of glycine through tunnel T3 was also calculated to investigate where 5-OXA is converted into glycine, and how glycine is released to the outside of HSO was explained.
The FastDesign protocol in the molecular modeling program Rosetta iterates between sequence optimization and structure refinement to stabilize de novo designed protein structures and complexes. FastDesign has been used previously to design novel protein folds and assemblies with important applications in research and medicine. To promote sampling of alternative conformations and sequences, FastDesign includes stages where the energy landscape is smoothened by reducing repulsive forces. Here, we discover that this process disfavors larger amino acids in the protein core because the protein compresses in the early stages of refinement. By testing alternative ramping strategies for the repulsive weight, we arrive at a scheme that produces lower energy designs with more native-like sequence composition in the protein core. We further validate the protocol by designing and experimentally characterizing over 4000 proteins and show that the new protocol produces higher stability proteins.
Predicting the range of substrates accepted by an enzyme from its amino acid sequence is challenging. Although sequence- and structure-based annotation approaches are often accurate for predicting broad categories of substrate specificity, they generally cannot predict which specific molecules will be accepted as substrates for a given enzyme, particularly within a class of closely related molecules. Combining targeted experimental activity data with structural modeling, ligand docking, and physicochemical properties of proteins and ligands with various machine learning models provides complementary information that can lead to accurate predictions of substrate scope for related enzymes. Here we describe such an approach that can predict the substrate scope of bacterial nitrilases, which catalyze the hydrolysis of nitrile compounds to the corresponding carboxylic acids and ammonia. Each of the four machine learning models (linear regression, random forest, gradient-boosted decision trees, and support vector machines) performed similarly (average ROC = 0.9, average accuracy = ~82%) for predicting substrate scope for this dataset. The approach is intended to be highly modular with respect to physicochemical property calculations and software used for docking and modeling.
Natural products and natural product-derived compounds have been widely used for pharmaceuticals for many years, and the search for new natural products that may have interesting activity is on going. Abyssomicins are natural product molecules that have antibiotic activity via inhibition of the folate synthesis pathway in microbiota. These compounds also appear to undergo a required [4+2] cycloaddition in their biosynthetic pathway. Here we report the structure of an FAD-dependent reductase, AbsH3, from the biosynthetic gene cluster of novel abyssomicins found in Streptomyces sp. LC-6-2.
The focal adhesion kinase (FAK) and the proline-rich tyrosine kinase 2-beta (PYK2) are implicated in cancer progression and metastasis and represent promising biomarkers and targets for cancer therapy. FAK and PYK2 are recruited to Focal Adhesions (Fas) via interactions between their Focal Adhesion Targeting (FAT) domains and conserved segments (LD motifs) on the proteins Paxillin, Leupaxin and Hic-5. A promising new approach for the inhibition of FAK and PYK2 targets interactions of the FAK domains with proteins that promote localization at Focal Adhesions. Advances toward this goal include the development of surface plasmon resonance, HSQC-NMR and fluorescence polarization assays for the identification of fragments or compounds interfering with the FAK-Paxillin interaction. We have recently validated this strategy, showing that Paxillin mimicking polypeptides with 2-3 LD motifs displace FAK from FAs and block kinase-dependent and independent functions of FAK, including downstream integrin signalling and FA localization of the protein p130Cas. In the present work we study by all-atom molecular dynamics simulations the recognition of peptides with the Paxillin and Leupaxin LD motifs by the FAK-FAT and PYK2-FAT domains. Our simulations and free-energy analysis interpret experimental data on binding of Paxillin and Leupaxin LD motifs at FAK-FAT and PYK2-FAT binding sites, and assess the roles of consensus LD regions and flanking residues. Our results can assist in the design of effective inhibitory peptides of the FAK-FAT:Paxillin and PYK2-FAT:Leupaxin complexes and the construction of pharmacophore models for the discovery of potential small-molecule inhibitors of the FAK-FAT and PYK2-FAT focal adhesion based functions.
Isoflavonoid is one of the groups of flavonoids that play pivotal roles in the survival of land plants. Chalcone synthase (CHS), the first enzyme of the isoflavonoid biosynthetic pathway, catalyzes the formation of a common isoflavonoid precursor. We have previously reported that an isozyme of soybean CHS (termed GmCHS1) is a key component of the isoflavonoid metabolon, a protein complex to enhance efficiency of isoflavonoid production. Here, we determined the crystal structure of GmCHS1 as a first step of understanding the metabolon structure, as well as to better understand the catalytic mechanism of GmCHS1.
This paper reports on the results of research aimed to translate biometric 3D face recognition concepts and algorithms into the field of protein biophysics in order to precisely and rapidly classify morphological features of protein surfaces. Both human faces and protein surfaces are free-forms and some descriptors used in differential geometry can be used to describe them applying the principles of feature extraction developed for computer vision and pattern recognition. The first part of this study focused on building the protein dataset using a simulation tool and performing feature extraction using novel geometrical descriptors. The second part tested the method on two examples, first involved a classification of tubulin isotypes and the second compared tubulin with the FtSZ protein, which is its bacterial analogue. An additional test involved several unrelated proteins. Different classification methodologies have been used: a classic approach with a Support Vector Machine (SVM) classifier and an unsupervised learning with a k-means approach. The best result was obtained with SVM and the radial basis function (RBF) kernel. The results are significant and competitive with the state-of-the-art protein classification methods. This opens a new area for protein structure analysis.
Structural characterization of alternatively folded and partially disordered protein conformations remains challenging. Outer surface protein A (OspA) is a pivotal protein in Borrelia infection, which is the etiological agent of Lyme disease. OspA exists in equilibrium with intermediate conformations, in which the central and the C-terminal regions of the protein have lower stabilities than the N-terminal. Here, we characterize pressure- and temperature-stabilized intermediates of OspA by nuclear magnetic resonance spectroscopy combined with paramagnetic relaxation enhancement (PRE). We found that the C-terminal region of the intermediate was partially disordered; however, it retains weak specific contact with the N-terminal region, owing to a twist of the central β-sheet and increased flexibility in the polypeptide chain. The disordered C-terminal region of the pressure-stabilized intermediate was more compact than that of the temperature-stabilized form. Further, molecular dynamics simulation demonstrated that temperature-induced disordering of the β-sheet was initiated at the C-terminal region and continued through to the central region. An ensemble of simulation snapshots qualitatively described the PRE data from the intermediate and indicated that the intermediate structures of OspA may expose tick receptor-binding sites more readily than does the basic folded conformation.
Expansins have the remarkable ability to loosen plant cell walls and cellulose material without showing catalytic activity and therefore have potential applications in biomass degradation. To support the study of sequence-structure-function relationships and the search for novel expansins, the Expansin Engineering Database (ExED, https://exed.biocatnet.de) collected sequence and structure data on expansins from Bacteria, Fungi, and Viridiplantae, and expansin-like homologues such as carbohydrate binding modules, glycoside hydrolases, loosenins, swollenins, cerato-platanins, and EXPNs. Based on global sequence alignment and protein sequence network analysis, the sequences are highly diverse. However, many similarities were found between the expansin domains. Newly created profile hidden Markov models of the two expansin domains enable standard numbering schemes, comprehensive conservation analyses, and genome annotation. Conserved key amino acids in the expansin domains were identified, a refined classification of expansins and carbohydrate binding modules was proposed, and new sequence motifs facilitate the search of novel candidate genes and the engineering of expansins.