Double
nomenclature of sapovirus based on complete VP1 and NS6-7 nucleotide
sequences
Shaolei Ren1, 2, Liang Xue2, *,
Junshan Gao2, Weicheng Cai2, Peng
Lin1
1 College of Food Science and Engineering, Bohai
University, Jinzhou, Liaoning, 121013, China
2 Guangdong Provincial Key Laboratory of Microbial
Safety and Health, State Key Laboratory of Applied Microbiology Southern
China, Institute of Microbiology, Guangdong Academy of Sciences,
National Health Commission Science and Technology Innovation Platform
for Nutrition and Safety of Microbial Food, Guangzhou, Guangdong,
510070, China
* Corresponding author: No. 100, Xianlie Zhong Road, Guangzhou 510070,
P. R. China. Tel: +86-20-87680942; Fax: +86-20-87680942. E-mail:
xueliang@gdim.cn
ORCID
Liang Xue, 0000-0002-9131-8377
Abstract : A significant
foodborne pathogen that causes acute gastroenteritis worldwide is
sapovirus (SaV). Currently, SaV genotyping is primarily based on the VP1
gene. The single naming method based on the VP1 region can no longer
suit the needs of SaV research due to the emergence of recombinant
strains. Therefore, SaV nucleotide sequences with entire VP1 and NS6-7
genes sections were gathered in Genbank, and genetic distance
calculations and phylogenetic analyses were carried out to investigate
the double nomenclature based on
SaV VP1 and NS6-7 genes. They can be
further subdivided into genotypes and various genogroups based on the
genetic diversity of the entire VP1 region, and 12 genogroups and 30
genotypes were found, including tentative genotypes and genogroups. The
work is noteworthy for having discovered a novel genogroup, GNA1. There
was an interesting discovery of a class of sequences known as
bat-related sequences. The genetic distance between these sequences
approached the inter-genogroup genetic distance, which in this study was
classified as the bat genogroup. Thirty significant reference sequences
are proposed based on the VP1 genotypes. Phylogenetically, twelve P
(polymerase)-groups and 29 P-types (Including tentative genotypes and
genogroup) were identified based on the genetic diversity of nucleotide
sequences in the entire NS6-7 region, and related P-type reference
sequences were also suggested. Nine recombinant sequences, comprising
six recombinant genotypes (GI.1[P4], GI.2[P1], GII.4[P1],
GII.4[PNA1], GII.6[P2], and GV.NA1[P3]), were found as a
result of the dual nomenclature of the VP1 and NS6-7 genes. Dual
nomenclature based on VP1 and NS6-7 genes can effectively characterize
SaV recombination.
Keywords : Sapovirus; Genotyping; Double nomenclature; Genetic
distance; Recombination
1 INTRODUCTION
A global public health concern, SaV infection is becoming more widely
acknowledged as a major contributor to outbreaks and sporadic acute
gastroenteritis (AGE) [1, 2]. Diarrhea and
vomiting are the hallmarks of AGE, which is one of the world’s most
common causes of infant and child illness and mortality. Worldwide, 700
million cases of gastroenteritis are reported each year, resulting in
800,000 to 2 million fatalities[3-7].
The SaV is spread by the faecal-oral pathway, which can be contracted by
interpersonal contact or ingestion of tainted food or
drink[8]. Outbreaks of the related gastroenteritis
usually happen in semi-enclosed settings, such as restaurants, schools,
nursing homes, and kindergartens[9, 10]. Vomiting
and diarrhea are common post-infection symptoms. In healthy individuals,
these are often minor and self-limiting, and they normally go away in a
week or so. Serious issues could still arise,
though[11]. Worldwide gastroenteritis epidemics
caused by SaV have been on the rise in recent
years[12].
SaV is a non-enveloped, single-stranded, positive RNA virus that has a
3’terminated poly (a) tail and a genome size of roughly 7.1–7.7 kb. The
two open reading frames (ORF) that make up the genome encode five
non-structural proteins (NS1 to NS5), the primary capsid protein VP1,
and the NS6-7 fusion proteins, which act as RNA-dependent RNA polymerase
(RdRp). Minor structural protein (VP2) is encoded by ORF2. Some strains
are anticipated to include ORF3, although its purpose is still
uncertain. VP1 is considered to contain all the determinants of viral
attachment and antigenicity and is a key protein in determining genetic
variation and genotype of SaV. Based on the complete VP1 sequence, SaV
is divided into 19 genogroups (GI-GXIX), of which GI, GII, GIV, and GV
are capable of infecting humans[13-15].
The genetic diversity of SaV is
increasing, and the fallibility of RdRp is closely related to the
diversity. At present, SaV genotyping is mainly based on the single
nomenclature of the VP1 gene[16], but with the
emergence of recombinant strains, only based on the VP1 region is
increasingly unable to meet the requirements of SaV naming. Therefore,
in this study, SaV sequences reported worldwide were collected, and SaV
nucleotide sequences with complete VP1 and NS6-7 genes were screened and
analyzed, respectively, to systematically display the genotyping and
genetic diversity of SaV, and to explore the double nomenclature of SaV
based on VP1 and NS6-7 genes.
2 METHODS
2.1 Collection and processing of SaV sequences information
The nucleotide sequences with complete VP1 and NS6-7 regions were
collected by using the keyword ”sapovirus” in GenBank. This study limits
the capacity of the analytical data set to reduce pointless processes.
Sequence homology was evaluated using CD-HIT v4.8.1, and a threshold of
0.95 was chosen to eliminate extremely similar sequences. In a previous
work, the Norovirus Typing Tool Version 2.0
(https://www.rivm.nl/mpf/typingtool/norovirus/)[17]was used to obtain genotyping results based on the VP1 gene. Genotypes
are also ascertained concurrently by the Human Calicivirus Typing Tool
(https://calicivirustypingtool.cdc.gov/)[18].
National Center for Biotechnology Information (NCBI) online BLAST was
used to assess the genogroups data when the outcomes of the two tools
differed. Sequence alignments were performed using Multiple Alignment in
Fast Fourier Transform (MAFFT v7.520)[19], and the
full VP1 and NS6-7 nucleotide sequences were extracted by importing
Geneious v9.0.2.
2.2 Genetic distance calculation and phylogenetic analysis
Using MEGA 11, the p-distance model and a bootstrap of 1000 were used to
compute the genetic distance for the VP1 and NS6-7 nucleotide sequences,
respectively. Origin 2021 was then used to create a genetic distance
distribution map. The parameters for the best-fit model of nucleotide
substitutions were determined by Akaike information criterion (AIC) as
implemented in IQ-TREE v2.2.6. Meanwhile, the phylogenetic trees were
inferred by maximum-likelihood (ML) reconstruction as implemented with
IQ-TREE. The trees were visualized in the International Tree Of Life
(iTOL, http://itol.embl.de/).
2.3 Recombination analysis
Recombination may have occurred because of sequences with conflicting
genotypes in the evolutionary trees of VP1 and NS6-7 region nucleotide
sequences. The related VP1 and NS6-7 reference whole genome sequences
were employed as the comparison sequences, while the suspected
recombination’s whole genome sequence served as the query sequence.
MAFFT v7.520 was also used to align the sequences. The recombination
event of SaV was described using Simplot v3.5.1, and analysis was
conducted using the program’s usual parameters. The Kimura
(two-parameter) model was applied, with the window size being 200 bp and
the step size being 20 bp[20].
3 RESULTS
3.1 Information of SaV sequences data collected
679 full VP1 and 455 complete NS6-7 nucleotide sequences had been
reported as of August 15, 2023. Sequences that showed more than 95%
homology were eliminated, leaving 213 VP1 and 178 NS6-7 nucleotide
sequences, respectively, that were included in the analysis.
The VP1 typing information for these sequences was obtained in the prior
study using the Norovirus Typing Tool. Eighty-four of the sequences
could be assigned to certain genotypes, and the remaining sequences
might reveal information about genogroups. In addition, 87 sequences
having genotype information were identified when the above sequences
were submitted to the Human Calicivirus Typing Tool for genotyping
information identification. Online BLAST was used to ascertain the
genogroup information for the five sequences (all GII sequences) for
which the two genotyping tools produced different findings (Table 1).
3.2 Pairwise genetic distance analysis of nucleotide sequence of VP1
gene
Figure 1 shows the distribution of genetic distance. Three unique and
non-overlapping symmetric peaks are produced in frequency histograms of
paired distance values for full capsid nucleotide sequences, which are
believed to represent strains, genotypes, and genogroups, respectively.
All sequences have a genetic distance between 0-0.5663. The genetic
distance within genotypes ranged from 0-0.1991, while the genetic
distance within genogroups ranged from 0.1992-0.4100, per the genotyping
results and pairwise genetic distance analysis. Between genogroups, the
genetic distance varied from 0.4104 to 0.5663.
3.3 Phylogenetic analysis of nucleotide sequence in VP1 gene
Phylogenetic analysis of the aforementioned VP1 nucleotide sequences was
carried out to illustrate the SaV VP1 gene-based genotyping (Figure 2).
It was discovered that the genotyping tool’s output may not be accurate
in some sequences. For instance, the genotyping tool mistakenly
recognized JN420370 as the GI, which is a fairly obvious mistake. The
distance from several GV sequences was found to be inside the group when
the phylogenetic tree and genetic distance were combined, indicating
that the sequence should be a member of GV. Notably, a new genogroup,
designated GNA1 in this study, was discovered near GII. Sequences
description shows that the host is exclusively dogs, and genetic
distance shows that it has genogroup distance with nearby genogroups.
The five sequences yielded differing results from the two genotyping
tools, and genetic distance was used to identify the genotypes of these
sequences, which was in line with the findings of the Human Calicivirus
Typing Tool. Interestingly, there are 14 sequences clustered together on
the phylogenetic tree, and the sequences themselves are described as
bat-related sequences, indicating that there is intergroup distance from
all known genogroups, and there is also a genogroup distance between
these sequences. Such sequences are defined as “bat” sequences. Five
sequences belonging to unknown genogroups are also present. Table 2
summarizes all sequences of genotype and genogroup information that have
been rectified and characterized.
3.4 VP1 reference sequences
The longest sequence possible or the entire genome with significant
influence and an early discovery period is chosen as the reference
sequence, as indicated by results in 3.3. For these sequences, pairwise
genetic distance calculations were done, and ML phylogenetic trees were
created (Figure 3). Table 3 presents the 30 reference sequences that
were offered, comprising 12 genogroups.
3.5 Genetic distance analysis of NS6-7
Figure 4 depicts the genetic distance distribution of NS6-7, with all
sequences falling within the range of 0 to 0.5744. The NS6-7 genes of
distinct genogroup sequences, in contrast to the VP1 gene, exhibit
varying genetic distances between strains, genotypes, and genogroups.
More precisely, the strains, genotypes, and genogroups of the GI were
within the genetic distance range of 0-0.1321, 0.1326-0.3153,
0.3247-0.5545, the GII was within the range of 0-0.1002, 0.1003-0.3079,
0.3133-0.5474, and the GV was within the range of 0-0.2248,
0.2444-0.3293, 0.3472-0.5574, respectively.
3.6 Phylogenetic analysis of nucleotide sequence of NS6-7 gene
Phylogenetic analysis of 178
complete NS6-7 nucleotide sequences from all norovirus genogroups
confirmed ten (GI.P, GII.P, GIII.P, GV.P, GVI.P, GVII.P, GVII.P, GX.P,
GXI.P, GXII.P) P-groups and two tentative (GNA1.P and GNA2.P) P-groups
(Figure 5). Among them, the NS6-7 sequences of GI can be divided into
seven P-types, GII can be divided into seven P-types (GII.P1- GII.P3,
GII.P5- GII.P7) and two tentative (GII.PNA1 and GII.PNA2) P-types, while
GV can be divided into three P-types and 1 tentative (GV.PNA1) P-type.
For the defined P-types, a total of 29 reference sequences were
suggested. Table 4 describes the genetic distances between all reference
sequences and within each reference sequence’s genogroups. All reference
sequences underwent phylogenetic analysis using the greatest likelihood
method (Figure 6).
3.7 Recombination analysis
Using SimPlot, a recombination study was carried out after the presence
of putative recombination sequences was indicated by the double
nomenclature (Figure 7). The relevant reference sequences mentioned
above are employed as comparison sequences for analysis to confirm their
recombination link and identify the position of the assumed
recombination breakpoint. Nine sequences exhibited recombination
signals, and six recombinant genotypes were discovered overall,
according to the data. Among them, 5081 was anticipated to be the
recombination site for GI.1[P4], GII.4[P1], and GII.4[PNA1],
5121 for GI.2[P1] and GII.6[P2], and 5261 for GV.NA1[P3].
4 DISCUSSION
Currently, there is a steady enrichment of SaV genetic diversity, and
further research is needed in genotyping. The limits of the VP1-based
naming method for the evolving SaV are becoming increasingly apparent.
Consequently, using the VP1/RdRp double naming method of norovirus as
the reference, the VP1/NS6-7 double naming method of SaV was examined in
this work. Numerous sequences with unknown genotypes were discovered
during the sequence collection phase in the earlier investigation of the
SaV epidemic process. There might be novel genogroups and genotypes
within these sequences in addition to the currently recognized
genotyping. The phylogenetic analysis of the VP1 region’s nucleotide
sequence and the computation of genetic distance were performed. The
findings indicate that there are differences in the categorization
information offered by the two genotyping tools, as well as instances of
non-classification and classification errors. In this regard, this study
corrected the misclassification and defined the unclassified sequences.
It is noteworthy that few
investigations have been conducted on non-human sequence genotyping, and
academics seem to have limited interest in non-human SaV genotyping.
This study revealed that SaV genetic diversity, particularly in
non-human sequences, may have been greatly underestimated. For instance,
GIII is now known to have only one genotype, GIII.1, but it is proposed
that there may be more genotypes based on both the pairwise genetic
distance and the phylogenetic tree. For the time being, we shall define
it as GIII.1. Furthermore, the fact
that there are over ten sequences indicates that they are all genogroup
distances. By using BLAST and sequence self-description, it is
discovered that most of these sequences are connected to bats and that
the majority of them only have one, which is why this study refers to
them as bat sequences. It’s worth talking about how to divide up any new
genogroups that might emerge.
Norovirus Typing Tool Version 2.0
and Human Calicivirus Typing Tool have been widely used. The Norovirus
Typing Tool has been moved to a new platform and is no longer accessible
through the previous URL. There are issues with the new platform when
genotyping the SaV,
though(https://www.genomedetective.com/app/typingtool/virus/)[17].
In contrast to the previous version, the new platform can only determine
whether the sequence is the SaV and cannot produce similar genotyping
findings. The Human Calicivirus Typing tool can provide quick typing
results, but as the name suggests, it can only type human SaV viruses.
The limited amount of reference strains primarily affects the genotyping
tool’s capabilities. Therefore, this study proposed the reference
nucleotide sequences of VP1 and NS6-7.
The SaV VP1 / NS6-7 double naming method suggestion is very important
for SaV fundamental research, particularly for recombination research.
Nine recombinant sequences were discovered in this analysis, with the
majority being human sequences. One of the main mechanisms and forces
behind the evolution of viruses is recombination, which also contributes
significantly to viral genetic diversity and epidemics. It is important
to investigate whether the recombinant strain will result in a SaV
outbreak.
In conclusion, SaV can be efficiently identified by the double
nomenclature method based on the VP1 / NS6-7 gene. It is important to
include the research of various region genotyping methods, such as
genome-wide nomenclature based on all protein regions. The foundation
and key component of virus research is the enhancement and improvement
of the typing system.
FUNDING STATEMENT
This work was supported by the National Key Research and Development
Program of China (2022YFF1103100), and the National Natural Science
Foundation of China (32272436).
CONFLICT OF INTERESTS
The authors declare no competing interests.
AUTHOR CONTRIBUTIONS
Shaolei Ren, Liang Xue, Junshan Gao, Weicheng Cai, and Peng Lin designed
and implemented the study. Shaolei Ren, Liang Xue, Junshan Gao, and
Weicheng Cai, collected and analyzed the data. Shaolei Ren and Liang Xue
drafted the manuscript. All authors discussed the results of the study
and agreed to the published version of the manuscript.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the
corresponding author upon reasonable request.
REFERENCES
[1] MAGWALIVHA M, KABUE J P, TRAORE A N, et al. Prevalence of human
sapovirus in low and middle income countries [J]. Adv Virol, 2018,
2018: 5986549.
[2] ZHUO R, DING X F, FREEDMAN S B, et al. Molecular Epidemiology of
Human Sapovirus among Children with Acute Gastroenteritis in Western
Canada [J]. J Clin Microbiol, 2021, 59(10): e0098621.
[3] DIEZ VALCARCE M, KAMBHAMPATI A K, CALDERWOOD L E, et al. Global
distribution of sporadic sapovirus infections: A systematic review and
meta-analysis [J]. Plos One, 2021, 16(8): e0255436.
[4] REYMAO T K, HERNANDEZ J D, COSTA S T, et al. Sapoviruses in
children with acute gastroenteritis from manaus , amazon region, brazil,
2010-2011 [J]. Rev Inst Med Trop Sao Paulo, 2016, 58: 81.
[5] LIU L, OZA S, HOGAN D, et al. Global, regional, and national
causes of under-5 mortality in 2000-15: an updated systematic analysis
with implications for the Sustainable Development Goals [J]. Lancet,
2016, 388(10063): 3027-35.
[6] WALKER C L, ARYEE M J, BOSCHI-PINTO C, et al. Estimating
diarrhea mortality among young children in low and middle income
countries [J]. Plos One, 2012, 7(1): e29151.
[7] MANOUANA G P, NGUEMA-MOURE P A, MBONG NGWESE M, et al. Genetic
Diversity of Enteric Viruses in Children under Five Years Old in Gabon
[J]. Viruses, 2021, 13(4): 545.
[8] HERGENS M P, NEDERBY OHD J, ALM E, et al. Investigation of a
food-borne outbreak of gastroenteritis in a school canteen revealed a
variant of sapovirus genogroup V not detected by standard PCR,
Sollentuna, Sweden, 2016 [J]. Eurosurveillance, 2017, 22(22): 30543.
[9] PANG X L, LEE B E, TYRRELL G J, et al. Epidemiology and genotype
analysis of sapovirus associated with gastroenteritis outbreaks in
Alberta, Canada: 2004-2007 [J]. J Infect Dis, 2009, 199(4): 547-51.
[10] JOHANSSON P J, BERGENTOFT K, LARSSON P A, et al. A nosocomial
sapovirus-associated outbreak of gastroenteritis in adults [J].
Scand J Infect Dis, 2005, 37(3): 200-4.
[11] LANDA E, JAVAID S, WON J S, et al. Septic shock secondary to
severe gastroenteritis resulting from sapovirus infection [J].
Cureus, 2022, 14(4): e24010.
[12] RAZIZADEH M H, KHATAMI A, ZAREI M. Global molecular prevalence
and genotype distribution of Sapovirus in children with gastrointestinal
complications: A systematic review and meta-analysis [J]. Rev Med
Virol, 2022, 32(3): e2302.
[13] OKA T, WANG Q, KATAYAMA K, et al. Comprehensive review of human
sapoviruses [J]. Clin Microbiol Rev, 2015, 28(1): 32-53.
[14] OKA T, LU Z, PHAN T, et al. Genetic characterization and
classification of human and animal sapoviruses [J]. Plos One, 2016,
11(5): e0156373.
[15] KUMTHIP K, KHAMRIN P, USHIJIMA H, et al. Genetic recombination
and diversity of sapovirus in pediatric patients with acute
gastroenteritis in Thailand, 2010-2018 [J]. PeerJ, 2020, 8: e8520.
[16] OKA T, MORI K, IRITANI N, et al. Human sapovirus classification
based on complete capsid nucleotide sequences [J]. Arch Virol, 2012,
157(2): 349-52.
[17] KRONEMAN A, VENNEMA H, DEFORCHE K, et al. An automated
genotyping tool for enteroviruses and noroviruses [J]. J Clin Virol,
2011, 51(2): 121-5.
[18] TATUSOV R L, CHHABRA P, DIEZ-VALCARCE M, et al. Human
Calicivirus Typing tool: A web-based tool for genotyping human norovirus
and sapovirus sequences [J]. J Clin Virol, 2021, 134: 104718.
[19] KATOH K, ASIMENOS G, TOH H. Multiple alignment of DNA sequences
with MAFFT [J]. Methods Mol Biol, 2009, 537: 39-64.
[20] XUE L, WU Q, DONG R, et al. Comparative phylogenetic analyses
of recombinant noroviruses based on different protein-encoding regions
show the recombination-associated evolution pattern [J]. Sci Rep,
2017, 7(1): 4976.
Table 1 Sequences of differences in genotypes results between genotyping
tools
Table 2 The VP1 genotype and genogroup information of the sequences have
been corrected
Table 3 VP1 reference sequences. The genetic distance between the
reference sequences within and outside the genogroups was calculated and
expressed as intervals
Table 4 NS6-7 reference sequences. The genetic distance between the
reference sequences within and outside the genogroups was calculated and
expressed as intervals
Figure 1
Pairwise
distance distribution histograms of VP1 nucleotide sequences
Figure 2 Phylogenetic trees of complete VP1 sequences of sapovirus. The
entry number and assigned genogroup and genotype are indicated. The
Manchester strain (Hu/Manchester/93/UK; GenBank accession no. X86560)
was selected as the root of the phylogenetic tree. The same genogroup
sequences were marked with the same color background, and the corrected
sequence was marked with a special format, with the wrong genotype
inside the vertical line and the correct genotype outside the vertical
line
Figure 3 Phylogenetic trees of complete VP1 reference sequences. The
entry number of the sequences and the genotype represented are labeled.
The Manchester strain (Hu/Manchester/93/UK; GenBank accession no.
X86560) was selected as the root of the phylogenetic tree. The scale
represents the nucleotide substitutions per site
Figure 4 Pairwise distance distribution histograms of
NS6-7 nucleotide sequences
Figure 5 Phylogenetic trees of complete NS6-7 sequences of sapovirus.
The entry number and assigned genogroup and genotype are indicated. The
Manchester strain (Hu/Manchester/93/UK; GenBank accession no. X86560)
was selected as the root of the phylogenetic tree. The same genogroups
sequences were marked with the same color background
Figure 6 Phylogenetic trees of complete NS6-7 reference sequences. The
entry number of the sequences and the genotype represented are labeled.
The Manchester strain (Hu/Manchester/93/UK; GenBank accession no.
X86560) was selected as the root of the phylogenetic tree. The scale
represents the nucleotide substitutions per site
Figure 7 Recombination analysis (a) GI.1[P4]-MN794208, (b)
GI.2[P1]-GQ261222, (c) GII.4[P1]-AB522397, (d)
GII.4[P1]-KX274477, (e) GII.4[P1]-MG012446, (f)
GII.4[P1]-MZ488271, (g) GII.4[PNA1]-MN794218, (h)
GII.6[P2]-MH933793, (i) GV.NA1[P3]-LC215885. The recombinant
sequence’s similarity to the reference strain’s nucleotide sequence is
represented by the y-axis. The nucleotide locations are displayed on the
X-axis. The recombinant prediction sites were those that shared
similarities with both the recombinant strains and the two SaV parental
strains with distinct genotypes