Abstract
Structural characterization of protein interactions is essential for our
ability to understand and modulate physiological processes.
Computational approaches to modeling of protein complexes provide
structural information that far exceeds capabilities of the existing
experimental techniques. Protein structure prediction in general, and
prediction of protein interactions in particular, has been
revolutionized by the rapid progress in Deep Learning techniques. The
work of Schweke et al. presents a community-wide study of an important
problem of distinguishing physiological protein-protein
complexes/interfaces (experimentally determined or modeled) from
non-physiological ones. The authors designed and generated a large
benchmark set of physiological and non-physiological homodimeric
complexes, and evaluated a large set of scoring functions, as well as
AlphaFold predictions, on their ability to discriminate the
non-physiological interfaces. The problem of separating physiological
interfaces from non-physiological ones is very difficult, largely due to
the lack of a clear distinction between the two categories in a crowded
environment inside a living cell. Still, the ability to identify key
physiologically significant interfaces in the variety of possible
configurations of a protein-protein complex is important. The study
presents a major data resource and methodological development in this
important direction for molecular and cellular biology.
Structural characterization of protein-protein interactions is essential
for our ability to understand and modulate molecular mechanisms of life
processes. Computational approaches are increasingly important for
studies of protein interactions, providing structural information that
far exceeds capabilities of the existing experimental techniques. Beyond
generating structural data, the ability to model protein interactions
leads to true understanding of the molecular mechanisms.
Prediction of protein interactions, as a term, has dual meaning. One is
predicting that the proteins do interact - e.g., predicting theexistence of interaction, or predicting the interactors, often in
the context of reconstructing the networks of protein interactions in
physiological processes. The other is predicting the mode of
interaction, given the existence of the interaction established by other
means, experimental or computational. Protein docking (prediction of the
structure of protein complexes) [1], as a branch of life sciences,
traditionally has been focusing primarily on the second problem. Docking
algorithms have not been specifically designed for or capable of
distinguishing interacting and non-interacting proteins, or
physiological and non-physiological interactions/interfaces. Typical
docking scores, resulting from the global search of the relative
positions of the proteins within a complex, are correlated with the
energy of interaction (otherwise they would not be able to make correct
predictions of the mode of interaction). However, such correlation has
been too loose for distinguishing weakly interacting (non-interacting)
from strongly interacting molecular pairs, based on the docking scores
alone. Re-evaluation of the global search docking predictions by more
accurate, and consequently more computationally expensive, scoring
functions is essential [2]. While improving docking predictions,
scoring also has a potential to distinguish physiological and
non-physiological predicted or experimentally determined interfaces.
Combining structure-based methodologies with complementary approaches
that are based on protein cellular co-localization, sequence analysis
and such, should improve our ability to correctly characterize protein
interactions.
Protein structure prediction in general has been revolutionized by the
rapid progress of the Deep Learning techniques. Most notable is a
spectacular success of the DeepMind’s AlphaFold that provides high
quality prediction of the tertiary structure [3]. Protein-protein
docking methodology originally was based primarily on the energy-based
considerations, including the concept of steric fit [4], later
extended to comparative modeling [5]. A major development in the
field has been the recent emergence of the Deep Learning based
approaches [6, 7] that extend the prediction success from the
tertiary to the quaternary structure. An important activity in the
protein docking/scoring field are the community-wide Critical Assessment
efforts [8], which provide platforms for the objective blind
assessment of the predictive approaches. The progress in structural
modeling of macromolecular complexes is propelling growing interest in
larger systems, up to the level of a whole cell [9-11]. Such
modeling addresses protein interactions in vivo , in a crowded
cellular environment.
In this issue of Proteomics , the paper by Schweke et al.
[12], centered around 3DBioInfo Elixir Community, presents a
community-wide study of an important problem of distinguishing
physiological protein-protein complexes/interfaces (experimentally
determined or modeled) from non-physiological ones. The authors designed
and generated a large benchmark set of physiological and
non-physiological homodimeric complexes, and evaluated a large set of
scoring functions, as well as AlphaFold predictions, on their ability to
discriminate the non-physiological interfaces.
The problem of separating physiological from non-physiological
interfaces (sometimes, called ”biological” vs. ”non-biological”
respectively) is very difficult. The core reason for this difficulty is
in the lack of a clear ”iron-clad” distinction between these two
categories in a living cell, where co-localized proteins extensively
interact with each other in a crowded environment in a variety of
interaction modes, heavily dominated by transient
interactions/encounters. Still, the ability to identify key
physiologically significant interfaces in the variety of possible
configurations of a protein-protein complex is important. The study
presents a major data resource and methodological development in this
important direction of molecular and cellular biology.
A major challenge for the future is developing the ability to
differentiate protein-protein physiological vs. non-physiologicalhetero complexes. Due to the diversity and versatility of the
protein interactions, this problem is inherently more difficult than
that of the homo oligomers. A solution may require accounting for
various aspects of macromolecular interactions in the crowded cellular
environment and significant involvement of the Deep Learning techniques.