3.3.1 Spike Protein
The CoV-2 is attached to the host receptors through S proteins. These are glycoproteins that are attached to the surface of virus giving it crown like appearance. Molecular weight of S protein is 141178 kDa and it has 1273 amino acids [21]. Genome sequencing has shown that S protein of COV-2 is 75% similar to SARS-CoV S protein. Ectodomain of virus have two sub-units S1 (residues 13-685), and S2 (residues 686-1273). These subunits gave clove like shape when three S1 subunits join to perform receptor binding and stem of S2 made up of timer that performs membrane fusion [22]. The S1 subunit of CoV-2 is 70% and S2 is 99% similar to the S proteins of SARS-CoV [18]. The S protein is responsible for the formation of attachments between infected and non-infected cells and thus involved in the spread of virus [23]. The Angiotensin-Converting Enzyme 2 (ACE2) is receptor of CoV-2 S protein that is present on the membrane of host cells.
Fusion of viral membrane to the host membrane undergoes structural transformation of the S protein. SARS-CoV-2 is 10 to 20 time more stable toward this viral-host binding because S1 subunit of S protein comprises of Receptor Binding Domain (RBD) [24]. It identifies and attaches to ACE2 receptors present on host cells. This enhanced receptor-spike interaction may be due to polymorphism at 501T resulting in promoting infectivity [25]. The RBD of CoV-2 and ACE2 receptor of host cell are interacted by hydrogen bonding as well as salt bridges. There are 17 RBD residues that interact with 20 ACE2 residues, out of them K417 of RBD form salt bridge with D30 of ACE2 rest of them form hydrogen bondings with respective residues [26,27]. Virus appears to be more proteolytic by the host cells proteases because of the unique prolonged loop formed by S1 and S2 subunits of CoV-2. S1subunit helps S2 subunit to achieve stable configuration during binding with host receptors by shedding and destabilizing itself. The RBD is present near the central pocket of S protein in its downward phase configuration. The S2 subunit consists of a fusion peptide (FP) and proteolytic site with a central FP along with two heptad-repeat before the transmembrane domain [28].
The S protein shows greater number of mutations among all structural proteins. Some of the most frequently occurred mutations reported in S protein are L18F, A222V, N439K, S477N, N501Y, D614G and P681H (Figure 3). Amino acid position 222 to 681 has been found as the most variable part as compared to whole S sequence, this includes S1 subunit of S protein. These frequently occurred mutations are mainly affecting NTD and RBD region of S1 subunit. We detected approximately 4725 different mutations, present along the whole coding region. Some of the most frequently seen mutations are listed in the table along with amino acids and their reported accession IDs (Table 1). The D614G mutation is the most frequent mutation of S protein with frequency of 266513; it has already been term as more infectious as compared to other mutant strains [29]. This mutant D614G strain increases the infectivity as it makes virus entry into the host cells more efficient as compared to the original strain and also reduce the shedding of S1 domain [30]. According to a recent study, D614G mutant increases the replication of virus in the epithelial lining of the human lungs and other airways of body mainly upper respiratory tract by enhancing the stability as well as infectivity and load of the virus [31,32]. The highest frequency of mutation (A222V) in spike has been found across Europe prevalent in many countries. However, this rise has no relationship with A222V in transmissibility [33].
The S1 subunit has been divided into N-terminal domain (NTD), receptor binding domain (RBD) comprised of residues 319-541 with receptor binding motif (residues 437-508). The RBD harbors a higher frequency of certain mutations than other domains. Mutation S477N has been detected in 16914 genomes, present in RBM. Similarly, the other most common variants present in RBM are N439K (5725), N501Y (4362), and Y453F (968) (Table 3). Among all the RBD and ACE2 binding residues, N501 showed more variations (N501Y (4362), N501T (48), N501S (8), N501R (1), N501I (1), N501G (1)). The N501Y (Table 2) is determining a tight interaction of CoV-2 RBD with ACE2 [34]. We found that mutations N501Y has a stabilization effect on S proteins (ΔΔG: 0.535 kcal/mol) (Figure 2) when computed through DynaMut server [16]. Mutation L5F has been detected in 3813 genomes, present in signal peptide. As already reported RBD mutation A348V (7), V367F (155), and A419S (11) that shows high antigenicity were also seen with a notable frequency exhibiting the mutant RBD [35].
Figure 1. SARS CoV-2 genome organization and mutations in structural proteins. Frequency of mutations has been shown.
In the CoV-2 lysine present at position 417 shows stronger interaction toward aspartic acid present on 30th position of ACE2 receptor. This bond actually enhances the interaction between host and receptor by making it more stronger [36], but we have seen three mutations at 417 position of receptor binding residue where it has replaced by Asparagine, Arginine and Proline with a frequency of 218, 2 and 1 respectively that shows K417N mutation may changes the receptor binding patterns of RBD and effects its infectivity (Table 2).
Numerous mutations in all the ACE binding domain of S proteins have been detected. Among which N501Y is present in highest frequency (4362). However, the effect of majority of these mutations on binding affinity is unknown. The Q493K and N501T in RBD might alter the binding affinity of S and ACE2 [34]. Similarly, E484K, present in RBD evade the antibodies. We detected E484R in UK genomes (supplementary file S1) in only two genomes which need further characterization. In a more recent study, Tyr449Asn individual substitutions retracted S2M11 antibody mediated neutralization, whereas the Leu455Phe variant decreased the neutralization potency [37]. The G446V variant in RBD has been detected in 46 genomes. The antibodies combinations, targeting different neutralizing epitopes might a useful strategy.
The transmembrane domain harbors P1263L mutations in 911 genomes sequences (Figure 3). Variants has also been detected in the heptad repeat1 (HR1) (residues 1163-1213), present in S2 subunits. Among them S982A is the most frequent (No. 3670) followed by D936Y (No.1239) and S939F (No. 555). Three variants (D1163Y, G1167V, V1176F) were also detected in high frequency (1903, 1753, 933) in heptad repeat 2 (HR2) region of S protein. Alanine is considered as a best choice in forming helix while on the other hand valine is subjected as a bad choice in helix formation. Therefore, we can assume that any mutation in the helical alanine with valine may results in secondary structure changes as A879S (456), A892V(45), and A930V (128) have seen with a notable frequency range, may result in forming beta-sheet instead of helix [38].
Some studies have shown four unique inserts in CoV-2 S1 subunit. Insert 1 is related to N terminal domain while insert 2 and 3 are implicated by CTD. Insert 4 lies at the intersection of two domains of S1 subunit. And other 1,2 and 3 inserts are similar in configuration to HIV-1 gp-120 while insert 4 is similar to Gag proteins [39]. Polymorphism other than 501T in RBD with certain amino acids in S protein of CoV-2 results in good binding with ACE2 receptor. Amino acids Glu493, Asn501, Leu455, Phe486, and Ser494 are subjected as boosting ACE2 binding [25].