Mutation landscape of SARS-CoV-2 proteome
To provide a mutation landscape of SARS-CoV-2 proteome, all the proteins were translated from the complete genomes and then aligned respectively. After alignments, all the mutated sites of respective proteins were analyzed by python script. It seemed that E, M, ORF6, ORF7a, ORF7b and ORF10 had high conservation while the other proteins showed more divergent. Beside the change of amino acid, lots of of deletions and insertions were found in ORF1ab and the spike protein.
All the mutations of SARS-CoV-2 proteome were showed in Supplementary S5 and some most frequent mutations was showed in Figure 3. Seven frequent point mutations were found in the large replicase polyproteins ORF1ab (T265I, L1599F, F3071Y, L3606F, P4715L, P5828L and Y5865C). One frequent mutations happened in S1 domain of the spike (S) protein (D614G) and three in the nucleocapsid (N) protein (S194L, R203K, G204R). But the other two structural proteins Envelope (E) and Membrane (M) protein were less prone to tolerate mutations. For the accessory proteins, three frequent mutations appeared in ORF3a (Q57H, G196V, G251V) and ORF8 (S24L, V62L, L84S), respectively while ORF6, ORF7a, ORF7b, ORF10 were more conservative. Of note, the R203K mutation of N protein was caused by three nucleotide mutations, which indicated strong positive selection and the significance should be investigated.
Figure 3