HIV and SIV
Retroviral zoonoses have received public health attention due to the origin of HIV-1 and human immunodeficiency virus type 2 linked to cross-species transmission of SIV from naturally infected primates, with a prevalence ranging up to 36%[56,57]. Even though viruses have been adapted in human society over thirty years, up to 2020, the estimated number of people living with HIV was 37.7 million[58,59], indicating research on HIV/AIDS (Acquired immunodeficiency syndrome) remains a top priority worldwide. Jabara and colleagues employed the molecular barcoding strategy (so called Primer ID in their article) to reveal HIV-1 dynamic changes in genetic variation[17]. A string of eight nucleotides of a molecular barcode followed by a string of three nucleotides of a sample barcode were incorporated in a reversed primer which anneals downstream of the HIV-1 protease gene (Figure 3A). Both barcodes can be followed through the PCR amplification and deep sequencing and be used to calibrate artificial errors generated by PCR amplification and high-throughput sequencing and then created a consensus sequence for individual viral templates. After calibration by using molecular barcodes, 80% of the unique sequence polymorphisms were removed in this dataset[17]. Of note, the distribution of identical molecular barcodes did not form a Gaussian distribution, meaning that each viral genome was not equally amplified. The authors assumed that this phenomenon was because the number of cDNA templates amplified in each RNA cycle varies[17].
In 2017 the laboratory of Dr. Filion published a high-throughput method named Barcoded HIV Ensembles (B-HIVE) to monitor the expression of thousands of proviruses in a cell population[25]. The principle of B-HIVE is to tag the HIV genome with a molecular barcode of 20 random nucleotides prior to generating infectious viral particles (Figure 3B)[25,60]. Advantages of molecular barcodes implemented in B-HIVE are two-folds. First, molecular barcodes were used to map HIV integration sites by generating a high-throughput sequencing library, in which every chimeric DNA is composed of the host local genomic region where provirus DNA inserts. Second, molecular barcodes were used to quantify HIV transcription by performing RT-PCR on viral RNA with the primers flanking the region of a barcode[25,60]. The authors verified the insert-specific provirus expression measured by B-HIVE by using another method called T7-PCR: a positive correlation between RNA barcode counts measured by B-HIVE and RT-qPCR values measured by T7-PCR for the selected barcodes were detected[25,60]. It is worthy noting that all the possible 20-mers represent enough complexity (420 distinct barcode combinations) so that the probability of two viruses having the same barcode is negligible. The authors also observed that the presence of the barcodes did not interfere with the efficiency of HIV (HIV-based vector) infection; side by side comparisons of infection rates between barcoded- and non-barcoded viral particles showed that barcoded viral particles were only ~5% less infectious[25].
Marsden and colleagues have successfully tagged molecular barcodes on the genome of nearly full-length and replication-competent HIV[61]. This HIV construct is derived from NL4-3, which has been modified to express an NL-hemagglutinin (HA) epitope in place of Vpr for the purpose of enriching HIV-infected cells[62]. Molecular barcodes encompassing 21 nucleotides in length were introduced in the region right downstream of the HA tag (Figure 3C); this region is not responsible for any gene expression and lack for any cis elements as well as splice sites (Figure 3C). It is important to note that a thymidine was designed to be placed in the sequence of barcodes with an interval of 2 nucleotides to avoid random strings of consecutive GC bases, subsequently resulting in the complexity of barcodes down to 414 combinations (Figure 3C). The authors verified the complexity of barcodes by employing Illumina HiSeq sequencing, showing that around 18,000 unique sequences of barcodes were homogeneous present in the original plasmid preparation. The average number of different nucleotides between any two barcodes was 11 base pairs, showing an appreciable diversity of barcodes. Moreover, the frequency of barcodes present in virions correlated with the frequency of barcodes present in the original plasmid preparation, meaning that the complexity of barcodes is able to be recapitulated in infectious particles[61]. Viral infectivity and replication were validated in GHOST CXCR4+CCR5+ cells and primary human peripheral blood mononuclear cells, showing that barcoded viruses are capable of infecting and replicating in primary cells without causing any bias in the presence of barcodes[61].
Apart from HIV, Fennessey and colleagues first established a model of rhesus macaques infected with barcoded SIV to study the dynamics of viral reservoir establishment and viral rebound over time[63]. Molecular barcodes encompassing 10 random nucleotides in length were inserted between the stop codon of vpx and the start codon ofvpr in the SIVmac239 plasmid (Figure 3D). Since only one restriction enzyme, MluI, was chosen for cloning barcodes, insertion of each barcode can thus be bidirectional, consequently doubling the complexity of the barcodes in viral stocks. In order to accurately estimate the complexity of input barcodes named viral clonotypes in the text, the authors reverse transcribed viral RNA into cDNA, which was diluted down to 5,000 viral copies in total 168 aliquots in order to avoid any sequencing error. The authors were able to nicely separate probability distributions between barcodes generated from PCR-induced errors and true sequences of barcodes (9,336 sequences)[63]. Every barcode contains an average of seven distinct nucleotides. Furthermore, the authors also showed that barcoded SIV did not cause any effect on viral infectivity and replication.