Remote homology detection of insulin and IR
Popular sequence alignment methods such as Blast are not suitable for the detection of sequence conservation between coral and human proteins because of the low level of conservation in the 20-30% sequence identity range. The method of choice for retrieval of coral homologs of human proteins was Hhblits. This is a so-called Hidden Markov Model based alignment approach developed by Johannes Soeding in 2005 (Remmert et al., 2011). Unlike traditional profile HMM’s, both query and template are HMM’s. The query HMM is generated by using amino acid distributions. Thus, the search for homologues is through an HMM-HMM alignment which makes this method extremely sensitive. This has been shown in many instances where hhblits has been able to successfully outperform the identification and alignment of remote homologues, as compared to the traditional profile HMM approach, such as HMMER3 (Remmert et al., 2011). Because of the 700 million years of evolution between corals and humans, this exquisite sensitivity of hhblits has been instrumental for the present study. The results of the hhblits search in the pdam genome for homologues of human insulin (uniprot ID P01308 ) and human IR (uniprot ID P06213 ) are shown in the Supplementarymaterials with filenames matching the uniprot ids.
Shown in Figure 1 is the sequence alignment of human insulin with pdam protein pdam_00013976. Similarly, IR was aligned with pdam_00006633 with high confidence (data not shown). In both cases, the alignments cover a large fraction of the sequence, 1164 out of 1382 amino acids in the case of IR and 101 out of 110 in the case of insulin. Manual inspection of the sequence alignment also shows a clear matching of similar sequences despite the low overall sequence identity. This was not possible with regular profile HMM sequence alignments (data not shown). To validate the alignment, we extracted known insulin residues involved in binding and you can see that these map to regions of high confidence alignment. Shown in different colors in Figure 1 are various residue motifs identified to be important for receptor binding by cryoelectron microscopy (Uchikawa et al., 2019). There is a good overlap between these functionally important motifs and the regions of high confidence alignment (9=highest, 0=lowest). We did the same for the insulin receptor and again found overlap with the ligand binding residues and high confidence alignment (data not shown). We also found conservation of the human disulfide bond between Cys647-Cys860 connecting FnII-2a and FnIII-3 in the coral sequence. We had identified this disulfide bond to be important for receptor activation as a signaling bridge (Ye et al., 2017), a hypothesis that was validated by the recent structural analysis (Uchikawa et al., 2019).