3.3.1 Spike Protein
The CoV-2 is attached to the host receptors through S proteins. These
are glycoproteins that are attached to the surface of virus giving it
crown like appearance. Molecular weight of S protein is 141178 kDa and
it has 1273 amino acids [21]. Genome sequencing has shown that S
protein of COV-2 is 75% similar to SARS-CoV S protein. Ectodomain of
virus have two sub-units S1 (residues 13-685), and S2 (residues
686-1273). These subunits gave clove like shape when three S1 subunits
join to perform receptor binding and stem of S2 made up of timer that
performs membrane fusion [22]. The S1 subunit of CoV-2 is 70% and
S2 is 99% similar to the S proteins of SARS-CoV [18]. The S protein
is responsible for the formation of attachments between infected and
non-infected cells and thus involved in the spread of virus [23].
The Angiotensin-Converting Enzyme 2 (ACE2) is receptor of CoV-2 S
protein that is present on the membrane of host cells.
Fusion of viral membrane to the host membrane undergoes structural
transformation of the S protein. SARS-CoV-2 is 10 to 20 time more stable
toward this viral-host binding because S1 subunit of S protein comprises
of Receptor Binding Domain (RBD) [24]. It identifies and attaches to
ACE2 receptors present on host cells. This enhanced receptor-spike
interaction may be due to polymorphism at 501T resulting in promoting
infectivity [25]. The RBD of CoV-2 and ACE2 receptor of host cell
are interacted by hydrogen bonding as well as salt bridges. There are 17
RBD residues that interact with 20 ACE2 residues, out of them K417 of
RBD form salt bridge with D30 of ACE2 rest of them form hydrogen
bondings with respective residues [26,27]. Virus appears to be more
proteolytic by the host cells proteases because of the unique prolonged
loop formed by S1 and S2 subunits of CoV-2. S1subunit helps S2 subunit
to achieve stable configuration during binding with host receptors by
shedding and destabilizing itself. The RBD is present near the central
pocket of S protein in its downward phase configuration. The S2 subunit
consists of a fusion peptide (FP) and proteolytic site with a central FP
along with two heptad-repeat before the transmembrane domain [28].
The S protein shows greater number of mutations among all structural
proteins. Some of the most frequently occurred mutations reported in S
protein are L18F, A222V, N439K, S477N, N501Y, D614G and P681H (Figure
3). Amino acid position 222 to 681 has been found as the most variable
part as compared to whole S sequence, this includes S1 subunit of S
protein. These frequently occurred mutations are mainly affecting NTD
and RBD region of S1 subunit. We detected approximately 4725 different
mutations, present along the whole coding region. Some of the most
frequently seen mutations are listed in the table along with amino acids
and their reported accession IDs (Table 1). The D614G mutation is the
most frequent mutation of S protein with frequency of 266513; it has
already been term as more infectious as compared to other mutant strains
[29]. This mutant D614G strain increases the infectivity as it makes
virus entry into the host cells more efficient as compared to the
original strain and also reduce the shedding of S1 domain [30].
According to a recent study, D614G mutant increases the replication of
virus in the epithelial lining of the human lungs and other airways of
body mainly upper respiratory tract by enhancing the stability as well
as infectivity and load of the virus [31,32]. The highest frequency
of mutation (A222V) in spike has been found across Europe prevalent in
many countries. However, this rise has no relationship with A222V in
transmissibility [33].
The S1 subunit has been divided into N-terminal domain (NTD), receptor
binding domain (RBD) comprised of residues 319-541 with receptor binding
motif (residues 437-508). The RBD harbors a higher frequency of certain
mutations than other domains. Mutation S477N has been detected in 16914
genomes, present in RBM. Similarly, the other most common variants
present in RBM are N439K (5725), N501Y (4362), and Y453F (968) (Table
3). Among all the RBD and ACE2 binding residues, N501 showed more
variations (N501Y (4362), N501T (48), N501S (8), N501R (1), N501I (1),
N501G (1)). The N501Y (Table 2) is determining a tight interaction of
CoV-2 RBD with ACE2 [34]. We found that mutations N501Y has a
stabilization effect on S proteins (ΔΔG: 0.535 kcal/mol) (Figure 2) when
computed through DynaMut server [16]. Mutation L5F has been detected
in 3813 genomes, present in signal peptide. As already reported RBD
mutation A348V (7), V367F (155), and A419S (11) that shows high
antigenicity were also seen with a notable frequency exhibiting the
mutant RBD [35].
Figure 1. SARS CoV-2 genome organization and mutations in
structural proteins. Frequency of mutations has been shown.
In the CoV-2 lysine present at position 417 shows stronger interaction
toward aspartic acid present on 30th position of ACE2
receptor. This bond actually enhances the interaction between host and
receptor by making it more stronger [36], but we have seen three
mutations at 417 position of receptor binding residue where it has
replaced by Asparagine, Arginine and Proline with a frequency of 218, 2
and 1 respectively that shows K417N mutation may changes the receptor
binding patterns of RBD and effects its infectivity (Table 2).
Numerous mutations in all the ACE binding domain of S proteins have been
detected. Among which N501Y is present in highest frequency (4362).
However, the effect of majority of these mutations on binding affinity
is unknown. The Q493K and N501T in RBD might alter the binding affinity
of S and ACE2 [34]. Similarly, E484K, present in RBD evade the
antibodies. We detected E484R in UK genomes (supplementary file S1) in
only two genomes which need further characterization. In a more recent
study, Tyr449Asn individual substitutions retracted S2M11 antibody
mediated neutralization, whereas the Leu455Phe variant decreased the
neutralization potency [37]. The G446V variant in RBD has been
detected in 46 genomes. The antibodies combinations, targeting different
neutralizing epitopes might a useful strategy.
The transmembrane domain harbors P1263L mutations in 911 genomes
sequences (Figure 3). Variants has also been detected in the heptad
repeat1 (HR1) (residues 1163-1213), present in S2 subunits. Among them
S982A is the most frequent (No. 3670) followed by D936Y (No.1239) and
S939F (No. 555). Three variants (D1163Y, G1167V, V1176F) were also
detected in high frequency (1903, 1753, 933) in heptad repeat 2 (HR2)
region of S protein. Alanine is considered as a best choice in forming
helix while on the other hand valine is subjected as a bad choice in
helix formation. Therefore, we can assume that any mutation in the
helical alanine with valine may results in secondary structure changes
as A879S (456), A892V(45), and A930V (128) have seen with a notable
frequency range, may result in forming beta-sheet instead of helix
[38].
Some studies have shown four unique inserts in CoV-2 S1 subunit. Insert
1 is related to N terminal domain while insert 2 and 3 are implicated by
CTD. Insert 4 lies at the intersection of two domains of S1 subunit. And
other 1,2 and 3 inserts are similar in configuration to HIV-1 gp-120
while insert 4 is similar to Gag proteins [39]. Polymorphism other
than 501T in RBD with certain amino acids in S protein of CoV-2 results
in good binding with ACE2 receptor. Amino acids Glu493, Asn501, Leu455,
Phe486, and Ser494 are subjected as boosting ACE2 binding [25].