Spike protein mutation sites
The spike (S) glycoprotein, which mediates entry into host cells and therefore determines the specificity, is the mostly intensively investigated protein of coronavirus. The S protein is composed of the putative N-terminal signal peptide, S1 which contains receptor-binding domain (RBD) and S2. Because of many Sporadic mutations, we only showed some representative mutations frequently happened in early submitted genomes. Thanks to the cryo-EM structure of SARS-CoV-2 S proteins (PDB ID: 6vsb), all these mutated sites were analyzed from the view of 3D structure. Twelve mutations were mapped onto the structure (Figure 4) and six more mutations (L5F, N74K, Y144del., G181V, S247R, G476S) were not shown in the structure because of the resolution and sequence length. In addition to one mutation (L5F) in the signal peptide and three in S2 fragment (F797C, A930V, D936Y), fourteen mutations appeared in the S1 fragment. To be specific, four mutations (A348T, R408I, D428E, G476S) were discovered in the RBD domain (left upper corner) and ten mutations (Y28N, H49Y, L54F, N74K, Y144del., F157L, G181V, S221W, S247R, D614G) were found in other part of S1.
Figure 4