HIV and SIV
Retroviral zoonoses have received public health attention due to the
origin of HIV-1 and human immunodeficiency virus type 2 linked to
cross-species transmission of SIV from naturally infected primates, with
a prevalence ranging up to
36%[56,57].
Even though viruses have been adapted in human society over thirty
years, up to 2020, the estimated number of people living with HIV was
37.7
million[58,59],
indicating research on HIV/AIDS (Acquired immunodeficiency syndrome)
remains a top priority worldwide. Jabara and colleagues employed the
molecular barcoding strategy (so called Primer ID in their article) to
reveal HIV-1 dynamic changes in genetic
variation[17].
A string of eight nucleotides of a molecular barcode followed by a
string of three nucleotides of a sample barcode were incorporated in a
reversed primer which anneals downstream of the HIV-1 protease gene
(Figure 3A). Both barcodes can be followed through the PCR amplification
and deep sequencing and be used to calibrate artificial errors generated
by PCR amplification and high-throughput sequencing and then created a
consensus sequence for individual viral templates. After calibration by
using molecular barcodes, 80% of the unique sequence polymorphisms were
removed in this
dataset[17].
Of note, the distribution of identical molecular barcodes did not form a
Gaussian distribution, meaning that each viral genome was not equally
amplified. The authors assumed that this phenomenon was because the
number of cDNA templates amplified in each RNA cycle
varies[17].
In 2017 the laboratory of Dr. Filion published a high-throughput method
named Barcoded HIV Ensembles (B-HIVE) to monitor the expression of
thousands of proviruses in a cell
population[25].
The principle of B-HIVE is to tag the HIV genome with a molecular
barcode of 20 random nucleotides prior to generating infectious viral
particles (Figure
3B)[25,60].
Advantages of molecular barcodes implemented in B-HIVE are two-folds.
First, molecular barcodes were used to map HIV integration sites by
generating a high-throughput sequencing library, in which every chimeric
DNA is composed of the host local genomic region where provirus DNA
inserts. Second, molecular barcodes were used to quantify HIV
transcription by performing RT-PCR on viral RNA with the primers
flanking the region of a
barcode[25,60].
The authors verified the insert-specific provirus expression measured by
B-HIVE by using another method called T7-PCR: a positive correlation
between RNA barcode counts measured by B-HIVE and RT-qPCR values
measured by T7-PCR for the selected barcodes were
detected[25,60].
It is worthy noting that all the possible 20-mers represent enough
complexity (420 distinct barcode combinations) so that
the probability of two viruses having the same barcode is negligible.
The authors also observed that the presence of the barcodes did not
interfere with the efficiency of HIV (HIV-based vector) infection; side
by side comparisons of infection rates between barcoded- and
non-barcoded viral particles showed that barcoded viral particles were
only ~5% less
infectious[25].
Marsden and colleagues have successfully tagged molecular barcodes on
the genome of nearly full-length and replication-competent
HIV[61].
This HIV construct is derived from NL4-3, which has been modified to
express an NL-hemagglutinin (HA) epitope in place of Vpr for the purpose
of enriching HIV-infected
cells[62].
Molecular barcodes encompassing 21 nucleotides in length were introduced
in the region right downstream of the HA tag (Figure 3C); this region is
not responsible for any gene expression and lack for any cis elements as
well as splice sites (Figure 3C). It is important to note that a
thymidine was designed to be placed in the sequence of barcodes with an
interval of 2 nucleotides to avoid random strings of consecutive GC
bases, subsequently resulting in the complexity of barcodes down to
414 combinations (Figure 3C). The authors verified the
complexity of barcodes by employing Illumina HiSeq sequencing, showing
that around 18,000 unique sequences of barcodes were homogeneous present
in the original plasmid preparation. The average number of different
nucleotides between any two barcodes was 11 base pairs, showing an
appreciable diversity of barcodes. Moreover, the frequency of barcodes
present in virions correlated with the frequency of barcodes present in
the original plasmid preparation, meaning that the complexity of
barcodes is able to be recapitulated in infectious
particles[61].
Viral infectivity and replication were validated in GHOST CXCR4+CCR5+
cells and primary human peripheral blood mononuclear cells, showing that
barcoded viruses are capable of infecting and replicating in primary
cells without causing any bias in the presence of
barcodes[61].
Apart from HIV, Fennessey and colleagues first established a model of
rhesus macaques infected with barcoded SIV to study the dynamics of
viral reservoir establishment and viral rebound over
time[63].
Molecular barcodes encompassing 10 random nucleotides in length were
inserted between the stop codon of vpx and the start codon ofvpr in the SIVmac239 plasmid (Figure 3D). Since only one
restriction enzyme, MluI, was chosen for cloning barcodes, insertion of
each barcode can thus be bidirectional, consequently doubling the
complexity of the barcodes in viral stocks. In order to accurately
estimate the complexity of input barcodes named viral clonotypes in the
text, the authors reverse transcribed viral RNA into cDNA, which was
diluted down to 5,000 viral copies in total 168 aliquots in order to
avoid any sequencing error. The authors were able to nicely separate
probability distributions between barcodes generated from PCR-induced
errors and true sequences of barcodes (9,336
sequences)[63].
Every barcode contains an average of seven distinct nucleotides.
Furthermore, the authors also showed that barcoded SIV did not cause any
effect on viral infectivity and replication.