Keywords
SARS-CoV-2, reinfection, closely strains, shotgun sequencing, Ecuador
To the Editor
To date, the reported SARS-CoV-2 reinfection cases are mainly based on
strains belonging to different clades.1–4 However, we
think these cases are rare because of the dominance of only a few
lineages in certain areas.3,5 Moreover,
differentiating closely related strains could be difficult when shotgun
technologies are used, and the accuracy depends on the parameters set
for the analysis. Therefore, a standardized pipeline is necessary. The
aim of this commentary is to report two cases of reinfection in 2020 in
Quito-Ecuador identified using Illumina sequencing analysis through an
open software bioinformatic analysis pipeline.
The first case (Patient A) was a 36-year-old man residing in
Quito-Ecuador. He traveled to Guayaquil on March 12 and had contact with
a COVID-19-positive patient. He was SARS-CoV-2 RT-qPCR positive on March
22 (Sample A1). On April 22, his qPCR test was negative but he had a new
positive sample on April 30 (Sample A2). The second case (Patient B) was
a 28-year-old man residing in Quito-Ecuador. The patient was confirmed
to be SARS-CoV-2 qRT-PCR positive on July 20th (Sample 2A). On August
03, he was IgG and IgM negative and SARS-CoV-2 qRT-PCR negative on
August 03 and 04. On October 26th, the patient reported new symptoms of
COVID-19 and was SARS-CoV-2 qRT-PCR positive (Sample 2B). The patient
was IgG and IgM positive on November 06 and 07 but was SARS-CoV-2 RT-PCR
negative. The case details are shown in Figure 1.
To ensure the patient identity in every sample, we took advantage of the
trace genomic DNA in RNA samples (our RNA extraction kit does not use
DNAses) to run a Multiplex STR System for Human Identification,
PowerPlex® 21 System (Promega). We identified the same human profile,
Patient A, in samples 1A and 1B, and the human profile of Patient B,
obtained from a blood sample, in the RNA extractions from samples 2A and
2B.
Whole genome sequencing of the SARS-CoV-2 samples was performed using
Paragon Genomics CleanPlex® SARS-CoV-2. The pipeline FastQC was used to
assess the raw reads’ quality,6 before cleaning them
with Trimmomatic 0.39 (PE -phred33 ILLUMINACLIP:
TruSeq3-PE.fa:2:30:10:2:keepBothReads,3 LEADING:20
TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:40).6 Cleaned
reads were mapped to the reference SARS-CoV-2 genome
(NC_045512.2),3,6 using BWA-MEM 0.7.17 with default
parameters.6 PCR duplicates were marked and removed
with Picard 2.23.8.3 Variant calling was performed
with bcftools 1.11 (-q 25 -Q 35).7 Only high-quality
variants (QUAL ≥ 20 and DP≥ 5),3,8 were retained and
used to generate the consensus sequences. In addition, the de novo
assembly produced by MEGAHIT 1.2.9 using default
parameters,6 was used to cross-validate with the
reference-based method as an internal control using
blastn.9 Clades were assigned using
Nextclade,3 and Pangolin.10Phylogeny was performed with CSI phylogeny (min depth 5x, min relative
depth 5x, min distance 5 bp, min quality 20).11Finally, we used the consensus sequences to identify missense mutations
using GISAID,12 and Genome
Detection.13
Consensus sequences of samples 1A (MW342706; EPI_ISL_681702) and 1B
(MW342708; ID EPI_ISL_681703) were assigned to the 20A and 20B clades
by Nexclade, B.1 and B.1.1 by Pangolin, and both were assigned as O by
GISAID. In the other patient, samples 2A (MW294007; EPI_ISL_660069)
and 2B (MW294011; EPI_ISL_660070) were assigned as 20B, B.1.1 and O,
respectively.
Additionally, variant calling (reference vs. consensus) showed 22
nucleotide changes in Patient A (12 only in sample 1A and 10 only in
sample 1B). In Patient B, 27 changes were identified (14 only in sample
2A and 13 only in sample 2B). Our analysis showed more nucleotide
changes than expected for reactivation or persistence, approximately 2-3
fold more for Patient A and 4-6 fold more for Patient B (based on the
natural rate of mutation of SARS-CoV-2 of 2-3 nucleotide changes per
month),14,15 supporting a reinfection. Changes in
amino acids are detailed in the Supplementary data.
In the two cases, the clinical data are consistent with a reinfection.
In Patient A, differentiation of the virus in the first and second
infections was easy because the strains belong to different clades.
Additionally, the travel and contact of Patient A with a
COVID-19-positive person supports his first infection by this clade in
another location. However, in Patient B, despite the clinical data being
robust, the strains were closely related. Therefore, our approach is a
valuable tool to support a reinfection case.
We propose this analysis approach as a useful tool to improve the
understanding of clinical and epidemiological data, evaluate the
infection potential of strains, correlate strains with the immunity
level and evaluate the effectiveness of vaccination, among other
applications in the study of the pandemic.