Mutation landscape of SARS-CoV-2 proteome
To provide a mutation landscape of SARS-CoV-2 proteome, all the proteins
were translated from the complete genomes and then aligned respectively.
After alignments, all the mutated sites of respective proteins were
analyzed by python script. It seemed that E, M, ORF6, ORF7a, ORF7b and
ORF10 had high conservation while the other proteins showed more
divergent. Beside the change of amino acid, lots of of deletions and
insertions were found in ORF1ab and the spike protein.
All the mutations of SARS-CoV-2 proteome were showed in Supplementary S5
and some most frequent mutations was showed in Figure 3. Seven frequent
point mutations were found in the large replicase polyproteins ORF1ab
(T265I, L1599F, F3071Y, L3606F, P4715L, P5828L and Y5865C). One frequent
mutations happened in S1 domain of the spike (S) protein (D614G) and
three in the nucleocapsid (N) protein (S194L, R203K, G204R). But the
other two structural proteins Envelope (E) and Membrane (M) protein were
less prone to tolerate mutations. For the
accessory
proteins, three frequent mutations appeared in
ORF3a
(Q57H, G196V, G251V) and ORF8 (S24L, V62L, L84S), respectively while
ORF6, ORF7a, ORF7b, ORF10 were more conservative. Of note, the R203K
mutation of N protein was caused by three nucleotide mutations, which
indicated strong positive selection and the significance should be
investigated.
Figure 3