Phylogenetic analysis
For general phylogenetic purposes, we compared MIRU-VNTR types against those (> 8,000 isolates) in general global MIRU-VNTR datasets and specific Beijing databases (Merker et al., 2015; Mestre et al., 2011).
For the phylogenetic analysis, WGS reads were aligned to the M. tuberculosis H37Rv (NC_000962.3) genome sequence with Bowtie 2 (Langmead & Salzberg, 2012). SAMtools v.0.1.18 and FreeBayes v.1.1.0 were used for variant calling (Koboldt et al., 2012; Marth, 2012). For FreeBayes, SNPs with a minimum mapping quality of 20, minimum coverage of 10 and alternate fraction of 0.9 were taken.
SNPs from our WGS data were compared against global NGS collections (> 9,000 downloaded MTB genomes) (Shitikov et al., 2017). For more specific analysis within the Beijing lineage, we compared WGS data against 200 strains corresponding to the L2.2.5/Asian African 3 subgroup (Supplementary Table). The phylogenetic tree was built based on overall SNPs after excluding repetitive, mobile elements, PE-PPE-PE_RGRS, drug-resistance associated genes, and artifact SNPs linked to indels using RAxML v8.2.11 (Stamatakis, 2014) under a GTR-CAT model with ascertainment bias correction. A subset of genomes of the L2.2.3/Asia Ancestral sublineage was taken as an outgroup. A smaller subset (clade including the Panama and Vietnamese isolates) was used for aligning and identifying the 858 SNP positions. Bayesian analysis of molecular sequences (BEAST) of three independent chains of 100 mio iterations (using evolutionary models: GTR, molecular clock, UCLD, population mode, and GMFR, Skyride) was performed. Three chains converged after about two mio iterations (considered burn-in and removed) and were combined into a single dataset that was then used to construct the maximum clade credibility tree.