Discussion
Being a busy environment, thousands of molecules constantly interact in the cell and through information exchange define the cellular metabolic state. Among all cellular homeostasis contributors, proteins are both the most active and most abundant [50] therefore, understanding their interactions and delineating their information sharing mechanism is essential for a detailed comprehension of cellular functionality. This further provides the first approach towards rational therapeutic agent development against many incapacitating or deadly diseases [51]. Despite the advances in structure determination through experimental methods, most of the known protein-protein interactions still have no atomic structure. NMR spectroscopy and X-ray crystallography, both of which are high resolution techniques struggle with high-throughput demand, while low resolution methods like the small-angle X-ray scattering and cryo-electron microscopy provide excessively coarse data. The development of molecular docking or computational structure prediction was first aimed at complementing experimental results but has since developed into a lively and independent research field [52].
Elucidating the organization and structural architecture of the CCAN is crucial for the understanding of the functionality and assembly of the kinetochore. The CENP-H, -I, -K and -M, among other subunits of the CCAN have previously been reported to form a stable complex based on reconstitution experiments and proteomic analyses [37, 53, 54, 55, 56, 57]. Our study for the first time present a computationally modeled high quality structure of the human CENP-HIKM complex (Figure 6) alongside a detailed report of the inter- and intra-residue interactions. Previously reported computational model of thehs CENP-I suggests that it assumes a fold in form of an α-solenoid which shares resemblance with the folding of β-importin [37, 58, 59]. The hs CENP-I N-terminal domain (composed of residues 57-281) was also reported to be sufficient enough for the binding of thehs CENP-H and hs CENP-K while the hs CENP-M sufficiently binds to the C-terminal domain. Contiguity between CENP-H, -I, and -K was hypothesized on the basis of proteomic analysis involving precipitates from phenotypic similarities as a result of individual subunit depletion, from 2-hybrid interaction data and from cell lysates [13, 60]. Additional analyses suggest that the revealed complex interaction is a representation of the evolutionarily conserved assembling mechanism of the CENP-HIK complex [14].
Structures of biologically essential proteins are consistently on a high demand, especially the large proteins and those that are members of complex systems. It is however not always feasible, for numerous reasons, to experimentally generate high resolution structures using the NMR, cryo-electron microscopy or X-ray crystallography. Among the numerous challenges are the poor diffraction of crystals, high aggregation and low stability of proteins [61]. In silicomolecular modeling in this situation can provide a high quality alternative for experimental research. One of the most challenging computational biology problems has been shown to be the De novo structure prediction of proteins only from amino acid sequences [32]. Recent advances in the field has revealed that some accurately-predicted long range contacts may permit correct topology-level structural modeling [62] and that the DCA (direct evolutionary coupling analysis) for most multiple sequence alignments may generate appreciable amount of long range native contacts for protein-protein interactions and proteins with a large number of homologous sequences [63, 64]. We have therefore employed the contact-assisted folding of proteins and contact prediction in the modeling of each subunit of the hs CENP-HIK 3D structure (Figure 1, Supplementary Figures S1 and S2).
Significant improvement has been made towards the generation of potential protein-protein interaction networks with the use of mass spectrometry, yeast two-hybrid assays [65] and high-throughput proteomics studies [66, 67]. X-ra crystallography-obtained atomic-level details are frequently required for the mechanistic interpretation of observed interactions. However, the occurrence of most biologically relevant interactions are in transient protein complexes, which makes the experimental determination of their structures largely difficult, even when structures of the interacting partners are known. Computational docking approaches have therefore been designed for the structural prediction of protein complexes with an accuracy similar to that provided by X-ray crystallography [68, 69]. A substantial amount of models with well defined atomic positions are usually generated after protein-protein docking protocols, but the currently available scoring functions possess low predictive accuracy for a reliable discrimination of models, and most often, models closest to the native structure are not easily detected solely through computational tools [69]. However, our near-native model selection in this study was guided by the architectural similarity of each generated model with the fungal and yeast orthologs of the protein complex, previously reported to be evolutionarily conserved (Figure 5).
The main cellular functions such as DNA replication, transcription, translation, protein folding and turnover, are directed by large macromolecular complexes such as proteasomes, chaperonins, ribosomes and polymerases. The mechanism of action of these macromolecules are often dynamical and require collective and large conformational changes [70]. Normal mode analysis is an approach that can be used for the description of the accessible flexible states of a protein around an equilibrium position based on small oscillation physics. When a macromolecule in a minimum energy conformation is perturbed slightly, a force is activated to restore the system back to its state of equilibrium [71]. There is always an equal division of vibrational energy in the system so that all vibrational modes have equal energy and the average amplitude of oscillation for any given mode scales as the inverse of its frequency. Thus, higher frequency modes with energetically greater displacement typically describe fast but small local amplitude movement relatively involving fewer atoms, while lower frequency modes describe slow displacements and changes in conformation on a large scale with the involvement of larger number of atoms [72]. Coarse-grained models merged with normal mode analysis has proven to be a popular and powerful substitute for the collective motion simulation of macromolecular complexes at extended timescales. In addition to the conformational sampling and motion dynamics visualization (Supplementary Figure S4 and S5), the normal mode analysis result also suggest that the hypothetical protein model assumes a stable conformation (Figure 7).
An essential prerequisite for regular biological function is the ability of a protein to establish inordinately selective interactions with its macromoleclar partner. Sequence mutations that changes protein interactions may lead to a complete functional abolishment or result into a significant perturbation [73]. A feasible method to evaluate mutational effect on the binding affinity of proteins is to experimentally quantify it. However, while site-directed mutagenesis methodologies are fast and inexpensive, FRET, isothermal titration calorimetry, surface plasmon resonance and other methods used for binding affinity measurements can be costly and time-consuming [74]. We have therefore directed computational approaches towards the prediction of binding affinity changes upon mutation (Tables 1-6, Supplementary Tables 1-6), which has shown great consistency with results from earlier reported experimental mutagenesis studies. Our interatomic interaction visualization study also provided insights into the molecular nature of the studied interactions and likewise the comprehension of the functional and structural impact of each mutation (Tables 7 and 8, Supplementary Figures S8-S10).