Keywords
SARS-CoV-2, reinfection, closely strains, shotgun sequencing, Ecuador
To the Editor
To date, the reported SARS-CoV-2 reinfection cases are mainly based on strains belonging to different clades.1–4 However, we think these cases are rare because of the dominance of only a few lineages in certain areas.3,5 Moreover, differentiating closely related strains could be difficult when shotgun technologies are used, and the accuracy depends on the parameters set for the analysis. Therefore, a standardized pipeline is necessary. The aim of this commentary is to report two cases of reinfection in 2020 in Quito-Ecuador identified using Illumina sequencing analysis through an open software bioinformatic analysis pipeline.
The first case (Patient A) was a 36-year-old man residing in Quito-Ecuador. He traveled to Guayaquil on March 12 and had contact with a COVID-19-positive patient. He was SARS-CoV-2 RT-qPCR positive on March 22 (Sample A1). On April 22, his qPCR test was negative but he had a new positive sample on April 30 (Sample A2). The second case (Patient B) was a 28-year-old man residing in Quito-Ecuador. The patient was confirmed to be SARS-CoV-2 qRT-PCR positive on July 20th (Sample 2A). On August 03, he was IgG and IgM negative and SARS-CoV-2 qRT-PCR negative on August 03 and 04. On October 26th, the patient reported new symptoms of COVID-19 and was SARS-CoV-2 qRT-PCR positive (Sample 2B). The patient was IgG and IgM positive on November 06 and 07 but was SARS-CoV-2 RT-PCR negative. The case details are shown in Figure 1.
To ensure the patient identity in every sample, we took advantage of the trace genomic DNA in RNA samples (our RNA extraction kit does not use DNAses) to run a Multiplex STR System for Human Identification, PowerPlex® 21 System (Promega). We identified the same human profile, Patient A, in samples 1A and 1B, and the human profile of Patient B, obtained from a blood sample, in the RNA extractions from samples 2A and 2B.
Whole genome sequencing of the SARS-CoV-2 samples was performed using Paragon Genomics CleanPlex® SARS-CoV-2. The pipeline FastQC was used to assess the raw reads’ quality,6 before cleaning them with Trimmomatic 0.39 (PE -phred33 ILLUMINACLIP: TruSeq3-PE.fa:2:30:10:2:keepBothReads,3 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:40).6 Cleaned reads were mapped to the reference SARS-CoV-2 genome (NC_045512.2),3,6 using BWA-MEM 0.7.17 with default parameters.6 PCR duplicates were marked and removed with Picard 2.23.8.3 Variant calling was performed with bcftools 1.11 (-q 25 -Q 35).7 Only high-quality variants (QUAL ≥ 20 and DP≥ 5),3,8 were retained and used to generate the consensus sequences. In addition, the de novo assembly produced by MEGAHIT 1.2.9 using default parameters,6 was used to cross-validate with the reference-based method as an internal control using blastn.9 Clades were assigned using Nextclade,3 and Pangolin.10Phylogeny was performed with CSI phylogeny (min depth 5x, min relative depth 5x, min distance 5 bp, min quality 20).11Finally, we used the consensus sequences to identify missense mutations using GISAID,12 and Genome Detection.13
Consensus sequences of samples 1A (MW342706; EPI_ISL_681702) and 1B (MW342708; ID EPI_ISL_681703) were assigned to the 20A and 20B clades by Nexclade, B.1 and B.1.1 by Pangolin, and both were assigned as O by GISAID. In the other patient, samples 2A (MW294007; EPI_ISL_660069) and 2B (MW294011; EPI_ISL_660070) were assigned as 20B, B.1.1 and O, respectively.
Additionally, variant calling (reference vs. consensus) showed 22 nucleotide changes in Patient A (12 only in sample 1A and 10 only in sample 1B). In Patient B, 27 changes were identified (14 only in sample 2A and 13 only in sample 2B). Our analysis showed more nucleotide changes than expected for reactivation or persistence, approximately 2-3 fold more for Patient A and 4-6 fold more for Patient B (based on the natural rate of mutation of SARS-CoV-2 of 2-3 nucleotide changes per month),14,15 supporting a reinfection. Changes in amino acids are detailed in the Supplementary data.
In the two cases, the clinical data are consistent with a reinfection. In Patient A, differentiation of the virus in the first and second infections was easy because the strains belong to different clades. Additionally, the travel and contact of Patient A with a COVID-19-positive person supports his first infection by this clade in another location. However, in Patient B, despite the clinical data being robust, the strains were closely related. Therefore, our approach is a valuable tool to support a reinfection case.
We propose this analysis approach as a useful tool to improve the understanding of clinical and epidemiological data, evaluate the infection potential of strains, correlate strains with the immunity level and evaluate the effectiveness of vaccination, among other applications in the study of the pandemic.