PROTEINS: Structure, Function, and Bioinformatics - Authorea

by author

by title

by keyword

zPoseScore model for accurate and robust protein-ligand docking pose scoring in CASP1...

Liangzhen Zheng

and 12 more

April 17, 2023

We introduce a deep learning-based ligand pose scoring model called zPoseScore for predicting protein-ligand complexes in the 15th Critical Assessment of Protein Structure Prediction (CASP15). Our contributions are three-fold: firstly, we generate six training and evaluation datasets by employing advanced data augmentation and sampling methods. Secondly, we redesign the “zFormer” module, inspired by AlphaFold2’s Evoformer, to efficiently describe protein-ligand interactions. This module enables the extraction of protein-ligand paired features that lead to accurate predictions. Lastly, we develop the zPoseScore framework with zFormer for scoring and ranking ligand poses, allowing for atomic-level protein-ligand feature encoding and fusion to output refined ligand poses and ligand per-atom deviations. Our results demonstrate excellent performance on various testing datasets, achieving Pearson’s correlation R = 0.783 and 0.659 for ranking docking decoys generated based on experimental and predicted protein structures of CASF-2016 protein-ligand complexes. Additionally, we obtain an averaged lDDT = 0.558 of AIchemy_LIG2 in CASP15 for de novo protein-ligand complex structure predictions. Detailed analysis shows that accurate ligand binding site prediction and side-chain orientation are crucial for achieving better prediction performance. Our proposed model is one of the most accurate protein-ligand pose prediction models and could serve as a valuable tool in small molecule drug discovery.

Improved Multimer Prediction using Massive Sampling with AlphaFold in CASP15

Björn Wallner

April 17, 2023

AlphaFold has transformed structure prediction by enabling highly accurate predictions on par with experimentally determined structures. Still, for difficult cases, in particular, multimers, there is still room for improvement. Important for the success of AlphaFold is its ability to assess its own predictions. The basic idea for the Wallner group in CASP15 was to exploit the excellent ranking score in AlphaFold by massive sampling. To this end, we ran AlphaFold using six different settings, with and without templates, and with an increased number of recycles using both multimer v1 and v2 weights. In all cases, the dropout layers were enabled at inference to sample the uncertainty and increase the diversity of the generated models. A median of 4,810 models per target was generated and almost all (35/38) received a ranking_confidence >0.7. Compared to other groups in CASP15, Wallner obtained the highest sum of Z-scores based on the DockQ score, 40.8 compared to 26.3 for the second highest, much higher than -0.2 achieved by the AlphaFold baseline method, NBIS-AF2-multimer. The improvement over the baseline is substantial with the mean DockQ increasing from 0.43 to 0.56, with several targets showing a DockQ score increase by +0.6 units. Remarkable, considering Wallner and NBIS-AF2-multimer were using identical input data. The reason for the success can be attributed to the diversified sampling using dropout with different settings and, in particular, the use of multimer v1, which seems to be much more susceptible to sampling compared to v2. The method is available here: http://wallnerlab.org/AFsample/.

Estimation of Model Accuracy in CASP15 Using the ModFOLDdock Server

Liam McGuffin

and 4 more

April 03, 2023

In CASP15 there was a greater emphasis on multimeric modelling than in previous experiments, with assembly structures nearly doubling in number (41 up from 22) since the previous round. CASP15 also included a new estimation of model accuracy (EMA) category in recognition of the importance of objective quality assessment for quaternary structure models. ModFOLDdock is a multimeric model quality assessment server developed by the McGuffin group at the University of Reading, which brings together a range of single-model, clustering and deep learning methods to form a consensus of approaches. For CASP15 three variants of ModFOLDdock were developed to optimise for the different facets of the quality estimation problem. The standard ModFOLDdock variant produced predicted scores optimised for positive linear correlations with the observed scores. The ModFOLDdockR variant produced predicted scores optimised for ranking, i.e., the top-ranked models have highest accuracy. In addition, the ModFOLDdockS variant used a quasi-single model approach to score each model on an individual basis. The scores from all three variants achieved strongly positive Pearson correlation coefficients with the CASP observed scores (oligo-lDDT) in excess of 0.70, which were maintained across both homomeric and heteromeric model populations. In addition, at least one of the ModFOLDdock variants was consistently ranked in the top two methods across all three EMA categories. Specifically, for overall global fold prediction accuracy, ModFOLDdock placed second and ModFOLDdockR placed third; for overall interface quality prediction accuracy ModFOLDdockR, ModFOLDdock and ModFOLDdockS were placed above all other predictor methods, and ModFOLDdockR and ModFOLDdockS were placed second and third respectively for individual residue confidence scores. The ModFOLDdock server is available at: [https://www.reading.ac.uk/bioinf/ModFOLDdock/](https://www.reading.ac.uk/bioinf/ModFOLDdock/). ModFOLDdock is also available as part of the MultiFOLD docker package: [https://hub.docker.com/r/mcguffin/multifold](https://hub.docker.com/r/mcguffin/multifold)

Leucine tunes hydropathy of class A GPCRs

Christian Baumann

and 1 more

March 17, 2023

Leucine and Isoleucine are two amino acids that differ only by the positioning of one methyl group. This small difference has however important consequences in α-helices, as the β-branching of Ile results in helix destabilization. We set out to investigate whether there are general trends for the occurrences of Leu and Ile residues in structures and sequences of class A GPCRs (G protein-coupled receptors). GPCRs are integral membrane proteins in which α-helices span the plasma membrane seven times and which play a crucial role in signal transmission into the cell. We found that Leu side chains are generally present in less densely packed regions and are more protein-surface exposed than Ile side chains. We explored whether this difference might be attributed to different functions of the two amino acids and tested if Leu adjusts the hydrophobicity of the transmembrane domain based on the Wimley-White whole-residue hydrophobicity scales. In class A GPCRs, Leu decreases the variation in hydropathy between receptors and Leu content correlates positively with hydropathy calculated without Leu. Both measures indicate that hydropathy is tuned by Leu. To test this idea further, we generated protein sequences with random amino acid compositions using a simple numerical model, in which hydropathy was tuned by adjusting the number of Leu residues. The model was able to replicate the observations made with class A GPCR sequences. We speculate that Leu tunes the hydropathy of the transmembrane domain of class A GPCRs to facilitate correct insertion into membranes and/or for stability within them.

Combining Pairwise Structural Similarity and Deep Learning Interface Contact Predicti...

Jianlin Cheng

and 4 more

March 13, 2023

Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and ranked first out of 24 predictors in estimating the global accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analayzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.

To split or not to split: CASP15 targets and their processing into tertiary structure...

Andriy Kryshtafovych

and 1 more

March 13, 2023

Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors’ performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.

Lipid exchange in crystal-confined Fatty Acid Binding Proteins: X-ray evidence and Mo...

H. Ariel Alvarez

and 5 more

March 13, 2023

A document by Eduardo Howard. Click on the document to view its contents.

Molecular binding of different classes of organophosphates to methyl parathion hydrol...

Thanyada Rungrotmongkol

and 5 more

March 08, 2023

Methyl parathion hydrolase (MPH) is an enzyme of the metallo-β-lactamase superfamily, which hydrolyses a wide range of organophosphates (OP). Recently, MPH has attracted attention as a promising enzymatic bioremediator. The crystal structure of MPH enzyme shows a dimeric form, with each subunit containing a binuclear metal ion center. MPH also demonstrates metal ion-dependent selectivity patterns. The origins of these patterns remain unclear but are linked to open questions about the more general role of metal ions in functional evolution and divergence within enzyme superfamilies. We aimed to investigate and compare the binding of different OP pesticides to MPH with cobalt(II) metal ions. In this study, MPH was modelled from Ochrobactrum sp. with two different classes of OP pesticides bound, including phosphomonoester (methyl paraoxon and dichlorvos) and S-substituted thiophsphotriester (profenofos). The docked structures for each substrate optimized by DFT calculation was selected and subjected to atomistic molecular dynamics simulations for 500 ns. It was found that alpha metal ions did not coordinate with all the pesticides. Rather, the pesticides coordinated with less buried beta metal ions. It was also observed that the coordination of beta metal ions was perturbed to accommodate the bulky pesticides. The binding free energy calculations and structure-based pharmacophore model revealed that all the three substrates could bind well at the active site. However, profenofos exhibits the stronger binding affinity to MPH in comparison to other two substrates. Therefore, the ability of the in silico analysis presented here could be informative for increasing enzyme stability and activity.

Rate limiting step of the allosteric activation of the bacterial adhesin FimH investi...

Gianluca Interlandi

January 27, 2023

The bacterial adhesin FimH is a model for the study of protein allostery because its structure has been resolved in multiple configurations, including the active and the inactive state. FimH consists of a pilin domain (PD) that anchors it to the rest of the fimbria and an allosterically regulated lectin domain (LD) that binds mannose on the surface of infected cells. Under normal conditions, the two domains are docked to each other and LD binds mannose weakly. However, in the presence of tensile force generated by shear the domains separate and conformational changes propagate across LD resulting in a stronger bond to mannose. Recently, the crystallographic structure of a variant of FimH has been resolved, called FimH FocH, where PD contains 10 mutations near the inter-domain interface. Although the X-ray structures of FimH and FimH FocH are almost identical, experimental evidence shows that FimH FocH is activated even in the absence of shear. Here, molecular dynamics simulations combined with the Jarzinski equality were used to investigate the discrepancy between the crystallographic structures and the functional assays. The results indicate that the free energy barrier of the unbinding process between LD and PD is drastically reduced in FimH FocH. Rupture of an inter-domain hydrogen bond involving R166 constitutes a rate limiting step of the domains separation process and occurs more readily in FimH FocH than FimH. In conclusion, the mutations in FimH FocH shift the equilibrium towards an equal occupancy of bound and unbound states for LD and PD by reducing a rate limiting step.

Fine Tuning Rigid Body Docking Results Using the Dreiding Force Field: A Computationa...

Burak Erman

and 1 more

January 20, 2023

This paper aims to understand the binding strategies of a nanobody-protein pair by studying known complexes. Rigid body protein-ligand docking programs produce several complexes, called decoys, which are good candidates with high scores of shape complementarity, electrostatic interactions, desolvation, buried surface area, and Lennard-Jones potentials. It is not known which decoy represents the true structure. We studied thirty-seven nanobody-protein complexes from the Single Domain Antibody Database, sd-Ab DB, [http://www.sdab-db.ca/](http://www.sdab-db.ca/). For each structure, a large number of decoys are generated using the Fast Fourier Transform algorithm of the software ZDOCK. The decoys were ranked according to their target protein-nanobody interaction energies, calculated by using the Dreiding Force Field, with rank 1 having the lowest interaction energy. Out of thirty-six PDB structures, twenty-five true structures were predicted as rank 1. Eleven of the remaining structures required Ångstrom size rigid body translations of the nanobody relative to the protein to match the given PDB structure. After the translation the Dreiding interaction (DI) energies of all complexes decreased and became rank 1. In one case, rigid body rotations as well as translations of the nanobody were required for matching the crystal structure. We used a Monte Carlo algorithm that randomly translates and rotates the nanobody of a decoy and calculates the DI energy. Results show that rigid body translations and the DI energy are sufficient for determining the correct binding location and pose of ZDOCK created decoys. A survey of the sd-Ab DB showed that each nanobody makes at least one salt bridge with its partner protein, indicating that salt bridge formation is an essential strategy in nanobody-protein recognition. Based on the analysis of the thirty-six crystal structures and evidence from existing literature, we propose a set of principles that could be used in the design of nanobodies.

Conformational changes in the AdeB transmembrane efflux pump by amphiphilic peptide M...

Mohammad Reza Shakibaie

and 4 more

December 20, 2022

No report exists on the role of Mastoparan B (MP-B) as an RND efflux pump inhibitor in multi-drug resistant (MDR) Acinetobacter baumannii. Here, we performed a series of in-silico experiments to predict the inhibition of the AdeB efflux pump by MP-B as a drug target agent. For this reason, an MDR strain of A. baumannii was subjected to minimum inhibitory concentration (MIC) against 12 antibiotics as well as MP-B. Expression of the a deB gene in the absence and presence of sub-MIC of MP-B was studied by qRT-PCR. It was found that MP-B had potent antimicrobial activity (MIC=1 µg/ml) associated with a 20-fold decrease in its expression at sub-MIC of MP-B. The stereochemical analysis using several automated servers confirmed that the AdeB is an inner membrane of the RND tripartite complex system with helix-turn-helix conformation and a pore rich in Phe, Ala, and Lys residue. The best model that showed high accuracy (Z=1.2, C=1.41, TM=0.99, and RMSD=4.4) was selected for docking purposes using the Site Map tool, and the correct protein-peptide complexes were simulated in the BioLiP platform. The molecular docking via AutoDock/Vina suggested that MP-B form H-bound with amino acid residues of the AdeB helix-5 and caused a shift in the dihedral angle by distances of 9.0 Å, 9.3 Å, and 9.6 Å, respectively. This shift was detected by the AlphaFold 2 tool and influenced the overall druggability of the protein. From the results, we concluded that, MP-B can be a good candidate for inhibition of bacterial efflux pump.

Metastable Alpha-rich and Beta-rich Conformations of Small Aβ42 Peptide Oligomers

Philippe Derreumaux

and 2 more

December 15, 2022

Probing the structures of amyloid-beta (Aβ) peptides in the early steps of aggregation is extremely difficult experimentally and computationally. Yet, this knowledge is extremely important as small oligomers are the most toxic species. Experiments and simulations on Aβ42 monomer point to random coil conformations with either transient helical or β-strand content. Our current conformational description of small Aβ42 oligomers is funneled toward amorphous aggregates with some β-sheet content and rare excited states with well-ordered assemblies of β-sheets. In this study, we emphasize another view based on metastable α-helix bundle oligomers spanning the C-terminus residues which are predicted by the machine-learning AlphaFold2 method and supported indirectly by low-resolution experimental data on many amyloid polypeptides. This finding has consequences in designing drugs to reduce aggregation and toxicity.

SARS-CoV-2 neutralizing antibody epitopes are overlapping and highly mutated which ra...

V. Stalin Raj

and 3 more

November 11, 2022

The rapid adaptation of SARS-CoV-2 within the host species and the increased viral transmission triggered the evolution of different SARS-CoV-2 variants. Though numerous monoclonal antibodies (mAbs) have been identified as prophylactic therapy for SARS-CoV-2, the ongoing surge in the number of SARS-CoV-2 infections shows the importance of understanding the mutations in the spike and developing novel vaccine strategies to target all variants. Here, we report the map of experimentally validated 74 SARS-CoV-2 neutralizing mAb binding epitopes of all variants. The majority (87.84%) of the potent neutralizing epitopes are localized to the receptor-binding domain (RBD) and overlap with each other, whereas limited (12.16%) epitopes are found in the N-terminal domain (NTD). Notably, 69 out of 74 mAb targets have at least one mutation at the epitope sites. The potent epitopes found in the RBD show higher mutations (4-10aa) compared to lower or modest neutralizing antibodies, suggesting that these epitopes might co-evolve with the immune pressure. The current study shows the importance of determining the critical mutations at the antibody recognition epitopes, leading to the development of broadly reactive immunogens targeting multiple SARS-CoV-2 variants. Further, vaccines inducing both humoral and cell-mediated immune responses might prevent the escape of SARS-CoV-2 variants from neutralizing antibodies.

Convolutional ProteinUnetLM competitive with LSTM-based protein secondary structure p...

Katarzyna Stąpor

and 3 more

October 21, 2022

The protein secondary structure (SS) prediction plays an important role in the characterization of general protein structure and function. In recent years, a new generation of algorithms for SS prediction based on embeddings from protein language models (pLMs) is emerging. These algorithms reach state-of-the-art accuracy without the need for time-consuming multiple sequence alignment (MSA) calculations. LSTM-based SPOT-1D-LM and NetSurfP-3.0 are the latest examples of such predictors. We present the ProteinUnetLM model using a convolutional Attention U-Net architecture that provides prediction quality and inference times at least as good as the best LSTM-based models for 8-class SS prediction (SS8). Additionally, we address the issue of the heavily imbalanced nature of the SS8 problem by extending the loss function with the Matthews correlation coefficient (MCC), and by proper assessment using previously introduced adjusted geometric mean metric (AGM). ProteinUnetLM achieved better AGM and sequence overlap score (SOV) than LSTM-based predictors, especially for the rare structures 310-helix (G), beta-bridge (B), and high curvature loop (S). It is also competitive on challenging datasets without homologs, free-modeling targets, and chameleon sequences. Moreover, ProteinUnetLM outperformed its previous MSA-based version ProteinUnet2, and provided better AGM than AlphaFold2 for 1/3 of proteins from the CASP14 dataset, proving its potential for making a significant step forward in the domain. To facilitate the usage of our solution by protein scientists, we provide an easy-to-use web interface under [https://biolib.com/SUT/ProteinUnetLM/](https://biolib.com/SUT/ProteinUnetLM/).

Distribution and solvent-exposure of Hsp70 chaperone binding sites across the E. coli...

Silvia Cavagnero

and 2 more

October 17, 2022

Many proteins must interact with molecular chaperones to achieve their native folded state in the cell. Yet, how chaperone binding and binding-site characteristics affect the folding process is poorly understood. The ubiquitous Hsp70 chaperone system prevents client-protein aggregation by holding unfolded conformations or by unfolding misfolded states. Hsp70 binding sites of client proteins comprise a nonpolar core surrounded by positively charged residues. However, a detailed analysis of Hsp70 binding sites on a proteome-wide scale is still lacking. Further, it is not known whether proteins undergo some degree of folding while chaperone bound. Here, we begin to address the above questions by identifying Hsp70 binding sites in 2,258 E. coli proteins. We find that most proteins bear at least one Hsp70 binding site and that the number of Hsp70 binding sites is directly proportional to protein size. Aggregation propensity upon release from the ribosome correlates with number of Hsp70 binding sites only in the case of large proteins. Interestingly, Hsp70 binding sites are more solvent-exposed than other nonpolar sites, in protein native states. Our findings show that the majority of E. coli proteins are systematically enabled to interact with Hsp70 even if this interaction only takes place during a fraction of the protein lifetime. In addition, our data suggest that some conformational sampling may take place within Hsp70-bound states, due to the solvent exposure of some chaperone binding sites in native proteins. In all, we propose that Hsp70-chaperone-binding traits have evolved to favor Hsp70-assisted protein folding devoid of aggregation.

An in silico prediction of interaction models of influenza a virus PA and human C14or...

Kadir TURAN

and 1 more

August 18, 2022

The human C14orf166 protein, also known as RTRAF, shows positive modulatory activity on the cellular RNA polymerase II enzyme. This protein is a component of the tRNA-splicing ligase complex and is involved in RNA metabolism. It also functions in the nucleo-cytoplasmic transport of RNA molecules. The C14orf166 protein has been reported to be associated with some types of cancer. It has been shown that the C14orf166 protein binds to the influenza A virus RNA polymerase PA subunit and has a stimulating effect on viral replication. In this study, candidate interactor proteins for influenza A virus PA protein were screened with a Y2H assay using HEK293 Matchmaker cDNA. The C14orf166 protein fragments in different sizes were found to interact with the PA. The three-dimensional structures of the viral PA and C14orf166 proteins interacting with the PA were generated using the I-TASSER algorithm. The interaction models between these proteins were predicted with the ClusPro protein docking algorithm and analyzed with PyMol software. The results revealed that the carboxy-terminal end of the C14orf166 protein is involved in this interaction, and it is highly possible that it binds to the carboxy-terminal of the PA protein. Although amino acid residues in the interaction area of the PA protein with the C14orf166 showed distribution from 450th to 700th position, the intense interaction region was revealed to be at amino acid positions 610 to 630.

Molecular insight into cellulose degradation by the phototrophic green alga Scenedesm...

Julieta Barchiesi

and 4 more

August 12, 2022

Lignocellulose is the most abundant natural biopolymer on earth and a potential raw material for the production of fuels and chemicals. However, only some organisms such as bacteria and fungi produce the necessary enzymes to metabolize it. In this work we detected the presence of extracellular cellulases in the genome of five species of Scenedesmus. These microalgae grow in both, freshwater and saltwater regions as well as in soils, displaying highly flexible metabolic properties. The comparison of sequences of the different cellulases with hydrolytic enzymes from other organisms by means of multi-sequence alignments and phylogenetic trees showed that these enzymes belong to the families of glycosyl hydrolases 1, 5, 9 and 10. In addition, most of these presented a greater similarity of sequence with enzymes from invertebrates, fungi, bacteria and other microalgae than with cellulases from plants; and the 3D modeling data obtained showed that both the main structures of the modeled proteins and the main amino acid residues implicated in catalysis and substrate binding are well conserved in Scenedesmus enzymes. We propose that these cellulase-producing phototrophic microorganisms could act as catalysts for the hydrolysis of cellulosic biomass fueled by sunlight.

Molecular Determinants of Tetrahydrocannabinol Binding to the Glycine Receptor

Lautaro D. Alvarez

and 1 more

August 09, 2022

The recognition of Cannabis as a source of new compounds suitable for medical use has attracted strong interest from the scientific community in its research, and substantial progress has accumulated regarding cannabinoids’ activity; however, a thorough description of their molecular mechanisms of action remains a task to complete. Highlighting their complex pharmacology, the list of cannabinoids’ interactors has vastly expanded beyond the canonical cannabinoid receptors. Among those, we have focused our study on the glycine receptor (GlyR), an ion channel involved in the modulation of nervous system responses, including, to our interest, sensitivity to peripheral pain. Here, we report the use of computational methods to investigate possible binding modes between the GlyR and Δ 9-tetrahydrocannabinol (THC). After obtaining a first pose for the THC binding from a biased molecular docking simulation and subsequently evaluating it by molecular dynamic simulations, we found a dynamic system with an identifiable representative binding mode characterized by the specific interaction with two transmembrane residues (Phe293 and Ser296). Complementarily, we assessed the role of membrane cholesterol in this interaction and positively established its relevance for THC binding to GlyR. Lastly, the use of restrained molecular dynamics simulations allowed us to refine the description of the binding mode and of the cholesterol effect. Altogether, our findings contribute to the current knowledge about the GlyR-THC mode of binding and propose a new starting point for future research on how cannabinoids in general, and THC in particular, modulate pain perception in view of its possible clinical applications.