Double nomenclature of sapovirus based on complete VP1 and NS6-7 nucleotide sequences
Shaolei Ren1, 2, Liang Xue2, *, Junshan Gao2, Weicheng Cai2, Peng Lin1
1 College of Food Science and Engineering, Bohai University, Jinzhou, Liaoning, 121013, China
2 Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, National Health Commission Science and Technology Innovation Platform for Nutrition and Safety of Microbial Food, Guangzhou, Guangdong, 510070, China
* Corresponding author: No. 100, Xianlie Zhong Road, Guangzhou 510070, P. R. China. Tel: +86-20-87680942; Fax: +86-20-87680942. E-mail: xueliang@gdim.cn
ORCID
Liang Xue, 0000-0002-9131-8377
Abstract : A significant foodborne pathogen that causes acute gastroenteritis worldwide is sapovirus (SaV). Currently, SaV genotyping is primarily based on the VP1 gene. The single naming method based on the VP1 region can no longer suit the needs of SaV research due to the emergence of recombinant strains. Therefore, SaV nucleotide sequences with entire VP1 and NS6-7 genes sections were gathered in Genbank, and genetic distance calculations and phylogenetic analyses were carried out to investigate the double nomenclature based on SaV VP1 and NS6-7 genes. They can be further subdivided into genotypes and various genogroups based on the genetic diversity of the entire VP1 region, and 12 genogroups and 30 genotypes were found, including tentative genotypes and genogroups. The work is noteworthy for having discovered a novel genogroup, GNA1. There was an interesting discovery of a class of sequences known as bat-related sequences. The genetic distance between these sequences approached the inter-genogroup genetic distance, which in this study was classified as the bat genogroup. Thirty significant reference sequences are proposed based on the VP1 genotypes. Phylogenetically, twelve P (polymerase)-groups and 29 P-types (Including tentative genotypes and genogroup) were identified based on the genetic diversity of nucleotide sequences in the entire NS6-7 region, and related P-type reference sequences were also suggested. Nine recombinant sequences, comprising six recombinant genotypes (GI.1[P4], GI.2[P1], GII.4[P1], GII.4[PNA1], GII.6[P2], and GV.NA1[P3]), were found as a result of the dual nomenclature of the VP1 and NS6-7 genes. Dual nomenclature based on VP1 and NS6-7 genes can effectively characterize SaV recombination.
Keywords : Sapovirus; Genotyping; Double nomenclature; Genetic distance; Recombination
1 INTRODUCTION
A global public health concern, SaV infection is becoming more widely acknowledged as a major contributor to outbreaks and sporadic acute gastroenteritis (AGE) [1, 2]. Diarrhea and vomiting are the hallmarks of AGE, which is one of the world’s most common causes of infant and child illness and mortality. Worldwide, 700 million cases of gastroenteritis are reported each year, resulting in 800,000 to 2 million fatalities[3-7].
The SaV is spread by the faecal-oral pathway, which can be contracted by interpersonal contact or ingestion of tainted food or drink[8]. Outbreaks of the related gastroenteritis usually happen in semi-enclosed settings, such as restaurants, schools, nursing homes, and kindergartens[9, 10]. Vomiting and diarrhea are common post-infection symptoms. In healthy individuals, these are often minor and self-limiting, and they normally go away in a week or so. Serious issues could still arise, though[11]. Worldwide gastroenteritis epidemics caused by SaV have been on the rise in recent years[12].
SaV is a non-enveloped, single-stranded, positive RNA virus that has a 3’terminated poly (a) tail and a genome size of roughly 7.1–7.7 kb. The two open reading frames (ORF) that make up the genome encode five non-structural proteins (NS1 to NS5), the primary capsid protein VP1, and the NS6-7 fusion proteins, which act as RNA-dependent RNA polymerase (RdRp). Minor structural protein (VP2) is encoded by ORF2. Some strains are anticipated to include ORF3, although its purpose is still uncertain. VP1 is considered to contain all the determinants of viral attachment and antigenicity and is a key protein in determining genetic variation and genotype of SaV. Based on the complete VP1 sequence, SaV is divided into 19 genogroups (GI-GXIX), of which GI, GII, GIV, and GV are capable of infecting humans[13-15].
The genetic diversity of SaV is increasing, and the fallibility of RdRp is closely related to the diversity. At present, SaV genotyping is mainly based on the single nomenclature of the VP1 gene[16], but with the emergence of recombinant strains, only based on the VP1 region is increasingly unable to meet the requirements of SaV naming. Therefore, in this study, SaV sequences reported worldwide were collected, and SaV nucleotide sequences with complete VP1 and NS6-7 genes were screened and analyzed, respectively, to systematically display the genotyping and genetic diversity of SaV, and to explore the double nomenclature of SaV based on VP1 and NS6-7 genes.
2 METHODS
2.1 Collection and processing of SaV sequences information
The nucleotide sequences with complete VP1 and NS6-7 regions were collected by using the keyword ”sapovirus” in GenBank. This study limits the capacity of the analytical data set to reduce pointless processes. Sequence homology was evaluated using CD-HIT v4.8.1, and a threshold of 0.95 was chosen to eliminate extremely similar sequences. In a previous work, the Norovirus Typing Tool Version 2.0 (https://www.rivm.nl/mpf/typingtool/norovirus/)[17]was used to obtain genotyping results based on the VP1 gene. Genotypes are also ascertained concurrently by the Human Calicivirus Typing Tool (https://calicivirustypingtool.cdc.gov/)[18]. National Center for Biotechnology Information (NCBI) online BLAST was used to assess the genogroups data when the outcomes of the two tools differed. Sequence alignments were performed using Multiple Alignment in Fast Fourier Transform (MAFFT v7.520)[19], and the full VP1 and NS6-7 nucleotide sequences were extracted by importing Geneious v9.0.2.
2.2 Genetic distance calculation and phylogenetic analysis
Using MEGA 11, the p-distance model and a bootstrap of 1000 were used to compute the genetic distance for the VP1 and NS6-7 nucleotide sequences, respectively. Origin 2021 was then used to create a genetic distance distribution map. The parameters for the best-fit model of nucleotide substitutions were determined by Akaike information criterion (AIC) as implemented in IQ-TREE v2.2.6. Meanwhile, the phylogenetic trees were inferred by maximum-likelihood (ML) reconstruction as implemented with IQ-TREE. The trees were visualized in the International Tree Of Life (iTOL, http://itol.embl.de/).
2.3 Recombination analysis
Recombination may have occurred because of sequences with conflicting genotypes in the evolutionary trees of VP1 and NS6-7 region nucleotide sequences. The related VP1 and NS6-7 reference whole genome sequences were employed as the comparison sequences, while the suspected recombination’s whole genome sequence served as the query sequence. MAFFT v7.520 was also used to align the sequences. The recombination event of SaV was described using Simplot v3.5.1, and analysis was conducted using the program’s usual parameters. The Kimura (two-parameter) model was applied, with the window size being 200 bp and the step size being 20 bp[20].
3 RESULTS
3.1 Information of SaV sequences data collected
679 full VP1 and 455 complete NS6-7 nucleotide sequences had been reported as of August 15, 2023. Sequences that showed more than 95% homology were eliminated, leaving 213 VP1 and 178 NS6-7 nucleotide sequences, respectively, that were included in the analysis.
The VP1 typing information for these sequences was obtained in the prior study using the Norovirus Typing Tool. Eighty-four of the sequences could be assigned to certain genotypes, and the remaining sequences might reveal information about genogroups. In addition, 87 sequences having genotype information were identified when the above sequences were submitted to the Human Calicivirus Typing Tool for genotyping information identification. Online BLAST was used to ascertain the genogroup information for the five sequences (all GII sequences) for which the two genotyping tools produced different findings (Table 1).
3.2 Pairwise genetic distance analysis of nucleotide sequence of VP1 gene
Figure 1 shows the distribution of genetic distance. Three unique and non-overlapping symmetric peaks are produced in frequency histograms of paired distance values for full capsid nucleotide sequences, which are believed to represent strains, genotypes, and genogroups, respectively.
All sequences have a genetic distance between 0-0.5663. The genetic distance within genotypes ranged from 0-0.1991, while the genetic distance within genogroups ranged from 0.1992-0.4100, per the genotyping results and pairwise genetic distance analysis. Between genogroups, the genetic distance varied from 0.4104 to 0.5663.
3.3 Phylogenetic analysis of nucleotide sequence in VP1 gene
Phylogenetic analysis of the aforementioned VP1 nucleotide sequences was carried out to illustrate the SaV VP1 gene-based genotyping (Figure 2).
It was discovered that the genotyping tool’s output may not be accurate in some sequences. For instance, the genotyping tool mistakenly recognized JN420370 as the GI, which is a fairly obvious mistake. The distance from several GV sequences was found to be inside the group when the phylogenetic tree and genetic distance were combined, indicating that the sequence should be a member of GV. Notably, a new genogroup, designated GNA1 in this study, was discovered near GII. Sequences description shows that the host is exclusively dogs, and genetic distance shows that it has genogroup distance with nearby genogroups. The five sequences yielded differing results from the two genotyping tools, and genetic distance was used to identify the genotypes of these sequences, which was in line with the findings of the Human Calicivirus Typing Tool. Interestingly, there are 14 sequences clustered together on the phylogenetic tree, and the sequences themselves are described as bat-related sequences, indicating that there is intergroup distance from all known genogroups, and there is also a genogroup distance between these sequences. Such sequences are defined as “bat” sequences. Five sequences belonging to unknown genogroups are also present. Table 2 summarizes all sequences of genotype and genogroup information that have been rectified and characterized.
3.4 VP1 reference sequences
The longest sequence possible or the entire genome with significant influence and an early discovery period is chosen as the reference sequence, as indicated by results in 3.3. For these sequences, pairwise genetic distance calculations were done, and ML phylogenetic trees were created (Figure 3). Table 3 presents the 30 reference sequences that were offered, comprising 12 genogroups.
3.5 Genetic distance analysis of NS6-7
Figure 4 depicts the genetic distance distribution of NS6-7, with all sequences falling within the range of 0 to 0.5744. The NS6-7 genes of distinct genogroup sequences, in contrast to the VP1 gene, exhibit varying genetic distances between strains, genotypes, and genogroups. More precisely, the strains, genotypes, and genogroups of the GI were within the genetic distance range of 0-0.1321, 0.1326-0.3153, 0.3247-0.5545, the GII was within the range of 0-0.1002, 0.1003-0.3079, 0.3133-0.5474, and the GV was within the range of 0-0.2248, 0.2444-0.3293, 0.3472-0.5574, respectively.
3.6 Phylogenetic analysis of nucleotide sequence of NS6-7 gene
Phylogenetic analysis of 178 complete NS6-7 nucleotide sequences from all norovirus genogroups confirmed ten (GI.P, GII.P, GIII.P, GV.P, GVI.P, GVII.P, GVII.P, GX.P, GXI.P, GXII.P) P-groups and two tentative (GNA1.P and GNA2.P) P-groups (Figure 5). Among them, the NS6-7 sequences of GI can be divided into seven P-types, GII can be divided into seven P-types (GII.P1- GII.P3, GII.P5- GII.P7) and two tentative (GII.PNA1 and GII.PNA2) P-types, while GV can be divided into three P-types and 1 tentative (GV.PNA1) P-type. For the defined P-types, a total of 29 reference sequences were suggested. Table 4 describes the genetic distances between all reference sequences and within each reference sequence’s genogroups. All reference sequences underwent phylogenetic analysis using the greatest likelihood method (Figure 6).
3.7 Recombination analysis
Using SimPlot, a recombination study was carried out after the presence of putative recombination sequences was indicated by the double nomenclature (Figure 7). The relevant reference sequences mentioned above are employed as comparison sequences for analysis to confirm their recombination link and identify the position of the assumed recombination breakpoint. Nine sequences exhibited recombination signals, and six recombinant genotypes were discovered overall, according to the data. Among them, 5081 was anticipated to be the recombination site for GI.1[P4], GII.4[P1], and GII.4[PNA1], 5121 for GI.2[P1] and GII.6[P2], and 5261 for GV.NA1[P3].
4 DISCUSSION
Currently, there is a steady enrichment of SaV genetic diversity, and further research is needed in genotyping. The limits of the VP1-based naming method for the evolving SaV are becoming increasingly apparent. Consequently, using the VP1/RdRp double naming method of norovirus as the reference, the VP1/NS6-7 double naming method of SaV was examined in this work. Numerous sequences with unknown genotypes were discovered during the sequence collection phase in the earlier investigation of the SaV epidemic process. There might be novel genogroups and genotypes within these sequences in addition to the currently recognized genotyping. The phylogenetic analysis of the VP1 region’s nucleotide sequence and the computation of genetic distance were performed. The findings indicate that there are differences in the categorization information offered by the two genotyping tools, as well as instances of non-classification and classification errors. In this regard, this study corrected the misclassification and defined the unclassified sequences.
It is noteworthy that few investigations have been conducted on non-human sequence genotyping, and academics seem to have limited interest in non-human SaV genotyping. This study revealed that SaV genetic diversity, particularly in non-human sequences, may have been greatly underestimated. For instance, GIII is now known to have only one genotype, GIII.1, but it is proposed that there may be more genotypes based on both the pairwise genetic distance and the phylogenetic tree. For the time being, we shall define it as GIII.1. Furthermore, the fact that there are over ten sequences indicates that they are all genogroup distances. By using BLAST and sequence self-description, it is discovered that most of these sequences are connected to bats and that the majority of them only have one, which is why this study refers to them as bat sequences. It’s worth talking about how to divide up any new genogroups that might emerge.
Norovirus Typing Tool Version 2.0 and Human Calicivirus Typing Tool have been widely used. The Norovirus Typing Tool has been moved to a new platform and is no longer accessible through the previous URL. There are issues with the new platform when genotyping the SaV, though(https://www.genomedetective.com/app/typingtool/virus/)[17]. In contrast to the previous version, the new platform can only determine whether the sequence is the SaV and cannot produce similar genotyping findings. The Human Calicivirus Typing tool can provide quick typing results, but as the name suggests, it can only type human SaV viruses. The limited amount of reference strains primarily affects the genotyping tool’s capabilities. Therefore, this study proposed the reference nucleotide sequences of VP1 and NS6-7.
The SaV VP1 / NS6-7 double naming method suggestion is very important for SaV fundamental research, particularly for recombination research. Nine recombinant sequences were discovered in this analysis, with the majority being human sequences. One of the main mechanisms and forces behind the evolution of viruses is recombination, which also contributes significantly to viral genetic diversity and epidemics. It is important to investigate whether the recombinant strain will result in a SaV outbreak.
In conclusion, SaV can be efficiently identified by the double nomenclature method based on the VP1 / NS6-7 gene. It is important to include the research of various region genotyping methods, such as genome-wide nomenclature based on all protein regions. The foundation and key component of virus research is the enhancement and improvement of the typing system.
FUNDING STATEMENT
This work was supported by the National Key Research and Development Program of China (2022YFF1103100), and the National Natural Science Foundation of China (32272436).
CONFLICT OF INTERESTS
The authors declare no competing interests.
AUTHOR CONTRIBUTIONS
Shaolei Ren, Liang Xue, Junshan Gao, Weicheng Cai, and Peng Lin designed and implemented the study. Shaolei Ren, Liang Xue, Junshan Gao, and Weicheng Cai, collected and analyzed the data. Shaolei Ren and Liang Xue drafted the manuscript. All authors discussed the results of the study and agreed to the published version of the manuscript.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
[1] MAGWALIVHA M, KABUE J P, TRAORE A N, et al. Prevalence of human sapovirus in low and middle income countries [J]. Adv Virol, 2018, 2018: 5986549.
[2] ZHUO R, DING X F, FREEDMAN S B, et al. Molecular Epidemiology of Human Sapovirus among Children with Acute Gastroenteritis in Western Canada [J]. J Clin Microbiol, 2021, 59(10): e0098621.
[3] DIEZ VALCARCE M, KAMBHAMPATI A K, CALDERWOOD L E, et al. Global distribution of sporadic sapovirus infections: A systematic review and meta-analysis [J]. Plos One, 2021, 16(8): e0255436.
[4] REYMAO T K, HERNANDEZ J D, COSTA S T, et al. Sapoviruses in children with acute gastroenteritis from manaus , amazon region, brazil, 2010-2011 [J]. Rev Inst Med Trop Sao Paulo, 2016, 58: 81.
[5] LIU L, OZA S, HOGAN D, et al. Global, regional, and national causes of under-5 mortality in 2000-15: an updated systematic analysis with implications for the Sustainable Development Goals [J]. Lancet, 2016, 388(10063): 3027-35.
[6] WALKER C L, ARYEE M J, BOSCHI-PINTO C, et al. Estimating diarrhea mortality among young children in low and middle income countries [J]. Plos One, 2012, 7(1): e29151.
[7] MANOUANA G P, NGUEMA-MOURE P A, MBONG NGWESE M, et al. Genetic Diversity of Enteric Viruses in Children under Five Years Old in Gabon [J]. Viruses, 2021, 13(4): 545.
[8] HERGENS M P, NEDERBY OHD J, ALM E, et al. Investigation of a food-borne outbreak of gastroenteritis in a school canteen revealed a variant of sapovirus genogroup V not detected by standard PCR, Sollentuna, Sweden, 2016 [J]. Eurosurveillance, 2017, 22(22): 30543.
[9] PANG X L, LEE B E, TYRRELL G J, et al. Epidemiology and genotype analysis of sapovirus associated with gastroenteritis outbreaks in Alberta, Canada: 2004-2007 [J]. J Infect Dis, 2009, 199(4): 547-51.
[10] JOHANSSON P J, BERGENTOFT K, LARSSON P A, et al. A nosocomial sapovirus-associated outbreak of gastroenteritis in adults [J]. Scand J Infect Dis, 2005, 37(3): 200-4.
[11] LANDA E, JAVAID S, WON J S, et al. Septic shock secondary to severe gastroenteritis resulting from sapovirus infection [J]. Cureus, 2022, 14(4): e24010.
[12] RAZIZADEH M H, KHATAMI A, ZAREI M. Global molecular prevalence and genotype distribution of Sapovirus in children with gastrointestinal complications: A systematic review and meta-analysis [J]. Rev Med Virol, 2022, 32(3): e2302.
[13] OKA T, WANG Q, KATAYAMA K, et al. Comprehensive review of human sapoviruses [J]. Clin Microbiol Rev, 2015, 28(1): 32-53.
[14] OKA T, LU Z, PHAN T, et al. Genetic characterization and classification of human and animal sapoviruses [J]. Plos One, 2016, 11(5): e0156373.
[15] KUMTHIP K, KHAMRIN P, USHIJIMA H, et al. Genetic recombination and diversity of sapovirus in pediatric patients with acute gastroenteritis in Thailand, 2010-2018 [J]. PeerJ, 2020, 8: e8520.
[16] OKA T, MORI K, IRITANI N, et al. Human sapovirus classification based on complete capsid nucleotide sequences [J]. Arch Virol, 2012, 157(2): 349-52.
[17] KRONEMAN A, VENNEMA H, DEFORCHE K, et al. An automated genotyping tool for enteroviruses and noroviruses [J]. J Clin Virol, 2011, 51(2): 121-5.
[18] TATUSOV R L, CHHABRA P, DIEZ-VALCARCE M, et al. Human Calicivirus Typing tool: A web-based tool for genotyping human norovirus and sapovirus sequences [J]. J Clin Virol, 2021, 134: 104718.
[19] KATOH K, ASIMENOS G, TOH H. Multiple alignment of DNA sequences with MAFFT [J]. Methods Mol Biol, 2009, 537: 39-64.
[20] XUE L, WU Q, DONG R, et al. Comparative phylogenetic analyses of recombinant noroviruses based on different protein-encoding regions show the recombination-associated evolution pattern [J]. Sci Rep, 2017, 7(1): 4976.
Table 1 Sequences of differences in genotypes results between genotyping tools
Table 2 The VP1 genotype and genogroup information of the sequences have been corrected
Table 3 VP1 reference sequences. The genetic distance between the reference sequences within and outside the genogroups was calculated and expressed as intervals
Table 4 NS6-7 reference sequences. The genetic distance between the reference sequences within and outside the genogroups was calculated and expressed as intervals
Figure 1 Pairwise distance distribution histograms of VP1 nucleotide sequences
Figure 2 Phylogenetic trees of complete VP1 sequences of sapovirus. The entry number and assigned genogroup and genotype are indicated. The Manchester strain (Hu/Manchester/93/UK; GenBank accession no. X86560) was selected as the root of the phylogenetic tree. The same genogroup sequences were marked with the same color background, and the corrected sequence was marked with a special format, with the wrong genotype inside the vertical line and the correct genotype outside the vertical line
Figure 3 Phylogenetic trees of complete VP1 reference sequences. The entry number of the sequences and the genotype represented are labeled. The Manchester strain (Hu/Manchester/93/UK; GenBank accession no. X86560) was selected as the root of the phylogenetic tree. The scale represents the nucleotide substitutions per site
Figure 4 Pairwise distance distribution histograms of NS6-7 nucleotide sequences
Figure 5 Phylogenetic trees of complete NS6-7 sequences of sapovirus. The entry number and assigned genogroup and genotype are indicated. The Manchester strain (Hu/Manchester/93/UK; GenBank accession no. X86560) was selected as the root of the phylogenetic tree. The same genogroups sequences were marked with the same color background
Figure 6 Phylogenetic trees of complete NS6-7 reference sequences. The entry number of the sequences and the genotype represented are labeled. The Manchester strain (Hu/Manchester/93/UK; GenBank accession no. X86560) was selected as the root of the phylogenetic tree. The scale represents the nucleotide substitutions per site
Figure 7 Recombination analysis (a) GI.1[P4]-MN794208, (b) GI.2[P1]-GQ261222, (c) GII.4[P1]-AB522397, (d) GII.4[P1]-KX274477, (e) GII.4[P1]-MG012446, (f) GII.4[P1]-MZ488271, (g) GII.4[PNA1]-MN794218, (h) GII.6[P2]-MH933793, (i) GV.NA1[P3]-LC215885. The recombinant sequence’s similarity to the reference strain’s nucleotide sequence is represented by the y-axis. The nucleotide locations are displayed on the X-axis. The recombinant prediction sites were those that shared similarities with both the recombinant strains and the two SaV parental strains with distinct genotypes