3.2 Three types of hotspots: weak, moderate and strong hotspots
The difference in preference of hotspot residues between PPI and PPepI dataset are not very much evident with the overall frequency distribution (Fig. 1). Therefore, we tried to cluster the dataset obtained from residue scanning. The data clustering suggested the possible clusters may be 3 to 5. Upon manual examination and observing the trends, it was considered reasonable to divide the hotspot residues into approximately three different types. The difference of hotspot residues was found to be most pronounced in the following three approximately different ∆∆G ranges, we refer them as weak hotspots (loss in ∆∆G in 2-10 kcal/mol range), moderate hotspots (∆∆G in 10-20 kcal/mol range) and strong hotspot (∆∆G >20 kcal/mol).
Out of 3732 hotspots, a great majority of 68.7% (2565) belong to weak hotspot type. For PPI dataset, Gln, Leu Tyr are the most preferred. This is followed by Asn, Val, Lys, Glu, Ser and Pro, which also have substantial presence at the PPI interface. In contrast, in PPepI, Leu and Tyr are the most preferred hotspot residues with Leu having an overwhelming contribution in the distribution. Val, Thr, Pro and Ile also possess large frequencies in the distribution (Fig. 2). Thus, among weak hotspot type, in PPI, the high occurrence is observed for polar residues followed by hydrophobic residues and minor fraction of charged residues are also present. On the other hand, in PPepI data, hydrophobic residues are more preferred as compared to polar residues. Somewhat similar trend was observed for anchor residue in PPI category was observed, even though there are very few data observed in weak type. Frequency distribution for Gln is the highest followed by Asn and Lys. In PPepI, the paucity of data precluded us for any reliable predictions.
The data for moderate type (∆∆G in 10-20 kcal/mol) is shown in Fig 3. About 25.4% of data (949) belong to moderate hotspot type. In contrast to the weak type, Arg is overwhelmingly present (~18%) followed by Tyr (~12%) even though Lys and Leu also possess sizable frequencies (about 10%) in the distribution. Thus, the distribution in PPI category is dominated by charged and polar residues and minor fraction of hydrophobic residues are also present. In contrast, the distribution of PPepI data is dominated by substantial presence of polar (Tyr), hydrophobic (Leu, Ile) and charged (Arg) residues. Among the anchor residues in PPI, Leu is dominant followed by Arg, Tyr and Gln. However, in PPepI, highest frequencies were observed for only hydrophobic residues Leu, Ile, Val and Phe.
Out of 3732 hotspots, only 5.8% (218) belong to the strong hotspot type. The strong hotspot type is completely dominated by Arg residue being the single most dominant residue in PPI occupying frequency of ~42%. For PPepI category, Arg followed by Trp are the dominant residues, occupying frequencies of ~26% and ~20%, respectively (Fig. 4). Again for anchor residues, similar trend was observed in PPI with Arg predominantly present. For PPepI, Arg and Trp are preferred residues. Other than Arg, the bulky hydrophobic side chain of Trp also serves as suitable candidate for anchor residue in PPepI category.
Thus, going from the weak to the strong hotspot types, the PPI and PPepI categories tend to close the gap. In the weak type, differences are prominent with polar residues followed by hydrophobic and minor fraction of charged residues in PPI; hydrophobic followed by polar residues in PPepI category. Moving towards the moderate category, the nature of interactions shift towards the polar side in PPI with dominance of charged and polar residues. Hotspot nature in PPepI categories is represented by all three types of residues – polar, hydrophobic and charged. Finally, in the strong type, only Arg dominate the distribution in PPI, and in PPepI Arg as well as Trp are overwhelmingly present (Table 3).