Abstract
Structural characterization of protein interactions is essential for our ability to understand and modulate physiological processes. Computational approaches to modeling of protein complexes provide structural information that far exceeds capabilities of the existing experimental techniques. Protein structure prediction in general, and prediction of protein interactions in particular, has been revolutionized by the rapid progress in Deep Learning techniques. The work of Schweke et al. presents a community-wide study of an important problem of distinguishing physiological protein-protein complexes/interfaces (experimentally determined or modeled) from non-physiological ones. The authors designed and generated a large benchmark set of physiological and non-physiological homodimeric complexes, and evaluated a large set of scoring functions, as well as AlphaFold predictions, on their ability to discriminate the non-physiological interfaces. The problem of separating physiological interfaces from non-physiological ones is very difficult, largely due to the lack of a clear distinction between the two categories in a crowded environment inside a living cell. Still, the ability to identify key physiologically significant interfaces in the variety of possible configurations of a protein-protein complex is important. The study presents a major data resource and methodological development in this important direction for molecular and cellular biology.
Structural characterization of protein-protein interactions is essential for our ability to understand and modulate molecular mechanisms of life processes. Computational approaches are increasingly important for studies of protein interactions, providing structural information that far exceeds capabilities of the existing experimental techniques. Beyond generating structural data, the ability to model protein interactions leads to true understanding of the molecular mechanisms.
Prediction of protein interactions, as a term, has dual meaning. One is predicting that the proteins do interact - e.g., predicting theexistence of interaction, or predicting the interactors, often in the context of reconstructing the networks of protein interactions in physiological processes. The other is predicting the mode of interaction, given the existence of the interaction established by other means, experimental or computational. Protein docking (prediction of the structure of protein complexes) [1], as a branch of life sciences, traditionally has been focusing primarily on the second problem. Docking algorithms have not been specifically designed for or capable of distinguishing interacting and non-interacting proteins, or physiological and non-physiological interactions/interfaces. Typical docking scores, resulting from the global search of the relative positions of the proteins within a complex, are correlated with the energy of interaction (otherwise they would not be able to make correct predictions of the mode of interaction). However, such correlation has been too loose for distinguishing weakly interacting (non-interacting) from strongly interacting molecular pairs, based on the docking scores alone. Re-evaluation of the global search docking predictions by more accurate, and consequently more computationally expensive, scoring functions is essential [2]. While improving docking predictions, scoring also has a potential to distinguish physiological and non-physiological predicted or experimentally determined interfaces. Combining structure-based methodologies with complementary approaches that are based on protein cellular co-localization, sequence analysis and such, should improve our ability to correctly characterize protein interactions.
Protein structure prediction in general has been revolutionized by the rapid progress of the Deep Learning techniques. Most notable is a spectacular success of the DeepMind’s AlphaFold that provides high quality prediction of the tertiary structure [3]. Protein-protein docking methodology originally was based primarily on the energy-based considerations, including the concept of steric fit [4], later extended to comparative modeling [5]. A major development in the field has been the recent emergence of the Deep Learning based approaches [6, 7] that extend the prediction success from the tertiary to the quaternary structure. An important activity in the protein docking/scoring field are the community-wide Critical Assessment efforts [8], which provide platforms for the objective blind assessment of the predictive approaches. The progress in structural modeling of macromolecular complexes is propelling growing interest in larger systems, up to the level of a whole cell [9-11]. Such modeling addresses protein interactions in vivo , in a crowded cellular environment.
In this issue of Proteomics , the paper by Schweke et al. [12], centered around 3DBioInfo Elixir Community, presents a community-wide study of an important problem of distinguishing physiological protein-protein complexes/interfaces (experimentally determined or modeled) from non-physiological ones. The authors designed and generated a large benchmark set of physiological and non-physiological homodimeric complexes, and evaluated a large set of scoring functions, as well as AlphaFold predictions, on their ability to discriminate the non-physiological interfaces.
The problem of separating physiological from non-physiological interfaces (sometimes, called ”biological” vs. ”non-biological” respectively) is very difficult. The core reason for this difficulty is in the lack of a clear ”iron-clad” distinction between these two categories in a living cell, where co-localized proteins extensively interact with each other in a crowded environment in a variety of interaction modes, heavily dominated by transient interactions/encounters. Still, the ability to identify key physiologically significant interfaces in the variety of possible configurations of a protein-protein complex is important. The study presents a major data resource and methodological development in this important direction of molecular and cellular biology.
A major challenge for the future is developing the ability to differentiate protein-protein physiological vs. non-physiologicalhetero complexes. Due to the diversity and versatility of the protein interactions, this problem is inherently more difficult than that of the homo oligomers. A solution may require accounting for various aspects of macromolecular interactions in the crowded cellular environment and significant involvement of the Deep Learning techniques.