3.3.4 Nucleocapsid Protein (N)
The N protein binds the viral RNA, forming ribonucleoprotein which
facilitates virus interacting with the cellular processes and entering
into the host cells [56]. The N protein of CoV-2 (419aa) involved in
RNA package and virus particle release [57]. It can be detected at
the initial stages of infection. Nucleic acid sequences of N protein of
CoV-2 is 90% conserved as that of SARS-CoV [58]. It forms
Replication Transcription Complexes [59], important for viral genome
synthesis.
As N protein is involved in virus replications machinery, any mutation
in N protein may affect virus pathogenicity. Variations in amino acid
193 to 235 are more frequently mutating in the NTD and rest of serine
rich (SR) linker region. Some of the reported mutations having higher
frequencies globally are S194L, P199L, R203K, G204R, A220V, M234I, A376T
and A398V (Figure 6).
The N protein has two domains named N-terminal domain (NTD) and
C-terminal domain (CTD), connected by a serine rich SR linker region
(Figure 4) [60]. RNA bound to the N protein through NTD and more
precisely at N45-181 region of NTD that exist as monomer [61]. The
most frequently variants; P67S (1860), D103Y (2233), and H145Y (780)
have been detected in the NTD. However, the effect of these mutation on
its viral RNA binding affinity is still unknown. Amino acids, required
for binding of SARS-CoV RNA are present at position R94 and Y122.
Binding efficiency of CoV-2 is 6 to 8 time more than the previous
viruses as it has dimeric CTD, forming two disordered regions around NTD
while previous virus has a single CTD, as a result the combination of
linker region, NTD and CTD are important in improved binding capacity of
N protein to the RNA genome [62].
Dimers of N protein are formed that play important role during the
interaction of SR moieties of linker regions with the central region
[63]. The CTD has the residues that self-associate and form
homodimers. This basic nature of N terminus shows it as binding site of
viral RNA [64]. For viral genome processing The interaction of N and
Nsp3 proteins is essential for virus genome processing, the C terminal
domain of N protein anchors with the Nsp3 protein and the residues
forming this interaction can be a potential drug target [11]
[65]. The great force of repulsion present with in domain components
that provides electrostatically larger binding surface to RNA genome and
it also prevent oligomer and within domain interactions. So when RNA
binds to these domains it neutralizes the charges and as a result
protein molecules are attracted toward the genome and oligomerize to
form nucleocapsid [62]. About 1034 mutations had identified
globally, out of which 367 are primer binding sites, 684 are AA
substitutions across 317 unique positions, also having 82, 21, 83 of
NTD, CTD and SR linker region, and 11 in-frame deletion in the linker
region and other were in NTD region [66]. We detected 1632 different
kinds of mutation in N, prevalent in all domains. The SR linker region
harbors the highest frequency of mutations (Figure 6). The R203K has
been detected in the 82570 genomes (S4) followed by G204R (81858). The
frequency of mutations in CTD is fewer than NTD and SR region (S4).
The CTD of N protein has also shown a number of mutations. Some of these
mutations having higher frequencies are listed in the table (Table 4).
The CoV-2 has two important sites in N protein which include RNA binding
domain and other include Phosphorylation sites, both plays an important
role in binding with RNA and its replication, transcription and
packaging processes as well as in cell cycle. Any mutations in these
regions would be of great importance like S186Y (549), S197L (2165),
S202N (756), R203K (82570), and G204R (81858) are phosphorylation sites
in N protein, have undergone variations [67] (Table 6). The RNA
binding domain (RBD) (aa40-180) shows high frequency variations
including P67S (1860), D103Y (2233), and H145Y (780) [68]. These
mutations can alter the RNA binding patterns and may affect virus
replication and transcription processes. The N protein also shows its
importance in viral proliferation and functioning, which necessitates
for developing effective therapeutics against N to prevent virus
proliferation.
The N has been given a considerable importance in diagnosis, and being
proposed as alternative to spike, for designing and synthesis of vaccine
and drug target. However, the emergence of such large number of variants
may challenge diagnostics and vaccine designing efforts. We therefore,
propose a continuous screening of genome for better management of
ongoing structural proteins evolution of CoV‐2.