Estimation barcode collision
Molecular barcode sequences can be altered during barcode synthesis, primer ligation, PCR amplification and sequencing errors, leading to incorrect sample identification. These errors can be either nucleotide substitutions or small insertions and deletions[35]. It is thus critical to take barcode complexity into account while generating a barcoded library in order to diminish the negative impact on the accuracy of readouts when such circumstances happen. In general, barcode complexity determines the quality of sequencing libraries: the higher barcode complexity of a library is, the better experimental outcomes can reflect natural phenomena. It is also recommended to implement error correcting algorithms and codes in every analytical pipeline while decoding barcode sequences. Hamming codes[36,37]and Levenshtein codes[38] are two popular methods for error-correcting. Furthermore, evaluation of the influence of molecular barcodes on the capacity of a virus to fulfill its life cycle, so-called viral fitness[29–31]should be considered as well. Below we will detail approaches used to embed molecular barcodes in SARS-CoV-2, HIV-1 and Simian immunodeficiency virus (SIV), Influenza virus and Zika virus.