Estimation barcode collision
Molecular barcode sequences can be altered during barcode synthesis,
primer ligation, PCR amplification and sequencing errors, leading to
incorrect sample identification. These errors can be either nucleotide
substitutions or small insertions and
deletions[35].
It is thus critical to take barcode complexity into account while
generating a barcoded library in order to diminish the negative impact
on the accuracy of readouts when such circumstances happen. In general,
barcode complexity determines the quality of sequencing libraries: the
higher barcode complexity of a library is, the better experimental
outcomes can reflect natural phenomena. It is also recommended to
implement error correcting algorithms and codes in every analytical
pipeline while decoding barcode sequences. Hamming
codes[36,37]and Levenshtein codes[38] are two popular methods
for error-correcting. Furthermore, evaluation of the influence of
molecular barcodes on the capacity of a virus to fulfill its life cycle,
so-called viral
fitness[29–31]should be considered as well. Below we will detail approaches used to
embed molecular barcodes in SARS-CoV-2, HIV-1 and Simian
immunodeficiency virus (SIV), Influenza virus and Zika virus.