Figure Legends
Figure 1. Schematic representation of three principal approaches to prepare barcoded libraries. (A, C and D) Molecular barcodes can be introduced to a template by ligating sequencing adaptors (A) or hybridizing molecular inversion probes (C) or directly given by PCR amplification with target-specific primers (D). Of note, both forward- and reverse adaptors were synthesized followed by adaptor extension and A tailing. (B) Schematic representation of the different applications between the usage of molecular barcodes and sample barcodes. Molecular barcodes used in pooled sample 1 aims to correct sequencing errors: a misreading nucleotide, guanosine (G) for example, is corrected in final consensus sequences. Molecular barcodes used in pooled sample 2 aims to identify true mutations (marked by red triangles); a mistaken mutation (marked by a blue triangle) is eventually removed in final consensus sequences. Panel (A) is modified based on Figure 1 in Schmitt et al. (2012)[19]; panel (C) is modified based on Figure 1 in Hiatt et al. (2013)[23].
Figure 2. Schematic representation of the molecular barcoding strategies used for a population-scale testing to screen SARS-CoV-2-infected individuals. (A) A sequence of 10 base pairs DNA barcodes named LAMP barcodes was incorporated in the FIP primer while performing RT-PCR. PCR barcodes adjacent to Illumina P5 and P7 sequences flanked the both ends of the library. Annotated amplicon sequence is modified based on Figure 1b in Ludwig et al. (2021)[27]. (B) Two sets of unique barcodes named i5 and i7 sample barcodes were placed adjacent to the P5 and P7 adaptors in Illumina sequence primers in the stage of PCR amplification. Illustration is modified based on Figure 1b in Bloom et al. (2021)[51].
Figure 3. Schematic representation of the molecular barcoding strategies applied in HIV and SIV. (A) A swarm of cDNA synthesis primers containing a string of eight degenerate nucleotides named Primer ID and a three nucleotides sample barcode were used to PCR amplify the HIV-1 protease (pro ) gene. (B) A sequence of 20-nucleotides molecular barcodes was used to tag the region downstream the HIV 5’ long terminal repeat in the HIV-based vector. After infection, viral DNA containing molecular barcodes were inserted in the host genome. Inverse PCR performed on genome DNA isolated from the infected cells identifies provirus insertion sites; RT-PCR performed mRNA of barcoded proviruses measures viral transcription driven by the HIV 5’ long terminal repeat. (C) The genomic characterization of nearly full-length HIV and the composition of the molecular barcode sequence. A 21 nucleotides barcode sequence was inserted in a non-expressed region upstream of the HA tag overlapped with the HIV vpr gene in a nearly full-length and replication-competent HIV genome. The original sequence in the barcode region was given in this illustration. Each third nucleotide was replaced by a thymidine in the sequence of barcodes. Pink stick marks the region where a molecular barcode is inserted. Illustration is modified based on Figure 1A in Marsden et al. (2020)[61]. (D) The genomic characterization of SIV and the composition of the molecular barcode sequence. Molecular barcodes encompassing 10 random nucleotides in length were inserted between the stop codon of the SIVvpx gene and the start codon of the SIV vpr gene in the SIVmac239 plasmid. Illustration is modified based on Figure 1A in Fennessey et al. (2017)[63].
Figure 4. Schematic representation of the molecular barcoding strategies applied in Influenza A virus. (A) A string of 22 nucleotides molecular barcodes were carried by a shRNA library with amplification[66]. Amplified products encompassing molecular barcodes were inserted between the Influenza A virus genes encoding NS1 and NEP, which have been manipulated in their previous work[65]. Illustration is modified based on Figure 1A in Varble et al. (2014)[24]. (B) Three sorts of barcodes, including cell barcodes, UMI and viral barcodes were applied on viral mRNA to measure single mRNA transcript in cells infected with Influenza A viruses. Annotated amplicon was illustrated based on Figure 1C in Russell et al. (2018)[67].
Figure 5. Schematic representation of the genomic characterization of Zika virus and the composition of the molecular barcode sequence. Molecular barcode consisting of eight degenerate codons (24 nucleotides) was embedded into the gene encoding the NS2A protein. Pink stick marks the region where a molecular barcode is inserted. Nucleotides written in pink color are referred to as degenerate nucleotides.
Figure 6. Schematic representation of the rationale design of experimental evolution models proposed in this review article. Proposed experimental evolution models are composed of two parts: the reservoir of natural host cells/animals used to experimentally generate a swarm of laboratory-produced variant strains and the validation stage, in which we will predict phenotypes that are caused by genotypic changes and survey their impacts on human cells at a single-sequence level.