Marc Lensink

and 112 more

We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.

Jian Liu

and 4 more

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system in three main aspects: (1) a new deep learning based protein inter-residue distance predictor (DeepDist) to improve template-free (ab initio) tertiary structure prediction, (2) an enhanced template-based tertiary structure prediction method, and (3) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked 7th out of 146 predictors in protein tertiary structure prediction and ranked 3rd out of 136 predictors in inter-domain structure predic-tion. The results of MULTICOM demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. The performance of template-free tertiary structure prediction largely depends on the accuracy of distance pre-dictions that is closely related to the quality of multiple sequence alignments. The structural model quality assessment works reasonably well on targets for which a sufficient number of good models can be predicted, but may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed.

Chen Chen

and 3 more

Deep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling (FM) targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to predictive improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.