3.3.4 Nucleocapsid Protein (N)
The N protein binds the viral RNA, forming ribonucleoprotein which facilitates virus interacting with the cellular processes and entering into the host cells [56]. The N protein of CoV-2 (419aa) involved in RNA package and virus particle release [57]. It can be detected at the initial stages of infection. Nucleic acid sequences of N protein of CoV-2 is 90% conserved as that of SARS-CoV [58]. It forms Replication Transcription Complexes [59], important for viral genome synthesis.
As N protein is involved in virus replications machinery, any mutation in N protein may affect virus pathogenicity. Variations in amino acid 193 to 235 are more frequently mutating in the NTD and rest of serine rich (SR) linker region. Some of the reported mutations having higher frequencies globally are S194L, P199L, R203K, G204R, A220V, M234I, A376T and A398V (Figure 6).
The N protein has two domains named N-terminal domain (NTD) and C-terminal domain (CTD), connected by a serine rich SR linker region (Figure 4) [60]. RNA bound to the N protein through NTD and more precisely at N45-181 region of NTD that exist as monomer [61]. The most frequently variants; P67S (1860), D103Y (2233), and H145Y (780) have been detected in the NTD. However, the effect of these mutation on its viral RNA binding affinity is still unknown. Amino acids, required for binding of SARS-CoV RNA are present at position R94 and Y122. Binding efficiency of CoV-2 is 6 to 8 time more than the previous viruses as it has dimeric CTD, forming two disordered regions around NTD while previous virus has a single CTD, as a result the combination of linker region, NTD and CTD are important in improved binding capacity of N protein to the RNA genome [62].
Dimers of N protein are formed that play important role during the interaction of SR moieties of linker regions with the central region [63]. The CTD has the residues that self-associate and form homodimers. This basic nature of N terminus shows it as binding site of viral RNA [64]. For viral genome processing The interaction of N and Nsp3 proteins is essential for virus genome processing, the C terminal domain of N protein anchors with the Nsp3 protein and the residues forming this interaction can be a potential drug target [11] [65]. The great force of repulsion present with in domain components that provides electrostatically larger binding surface to RNA genome and it also prevent oligomer and within domain interactions. So when RNA binds to these domains it neutralizes the charges and as a result protein molecules are attracted toward the genome and oligomerize to form nucleocapsid [62]. About 1034 mutations had identified globally, out of which 367 are primer binding sites, 684 are AA substitutions across 317 unique positions, also having 82, 21, 83 of NTD, CTD and SR linker region, and 11 in-frame deletion in the linker region and other were in NTD region [66]. We detected 1632 different kinds of mutation in N, prevalent in all domains. The SR linker region harbors the highest frequency of mutations (Figure 6). The R203K has been detected in the 82570 genomes (S4) followed by G204R (81858). The frequency of mutations in CTD is fewer than NTD and SR region (S4).
The CTD of N protein has also shown a number of mutations. Some of these mutations having higher frequencies are listed in the table (Table 4). The CoV-2 has two important sites in N protein which include RNA binding domain and other include Phosphorylation sites, both plays an important role in binding with RNA and its replication, transcription and packaging processes as well as in cell cycle. Any mutations in these regions would be of great importance like S186Y (549), S197L (2165), S202N (756), R203K (82570), and G204R (81858) are phosphorylation sites in N protein, have undergone variations [67] (Table 6). The RNA binding domain (RBD) (aa40-180) shows high frequency variations including P67S (1860), D103Y (2233), and H145Y (780) [68]. These mutations can alter the RNA binding patterns and may affect virus replication and transcription processes. The N protein also shows its importance in viral proliferation and functioning, which necessitates for developing effective therapeutics against N to prevent virus proliferation.
The N has been given a considerable importance in diagnosis, and being proposed as alternative to spike, for designing and synthesis of vaccine and drug target. However, the emergence of such large number of variants may challenge diagnostics and vaccine designing efforts. We therefore, propose a continuous screening of genome for better management of ongoing structural proteins evolution of CoV‐2.