Population genetic structure and differentiation
DNA was extracted from liver sample of 22 male tree sparrow, a half from BY and the other from LJX, using the Qiagen DNeasy Blood and Tissue Kit. DNA concentration (minimum of 80 ng/µL) was measured using Qubit DNA Assay Kit in Qubit 2.0 Flurometer (Life Technologies, CA, USA). After sample quality control, the qualified DNA samples are randomly fragmented by Covaris and the fragments are collected by magnetic beads. Adenine are added to 3’end of end-repaired DNA fragments before adaptor ligation. The ligation products are then cyclized and then amplified by linear isothermal Rolling-Circle Replication and DNA NanoBall technology. Sequencing of these DNA libraries is performed with BGISEQ sequencing platform.
SOAPnuke v2.1.0 (Chen et al., 2018) was used to remove any remaining adapter and trim low quality reads. The clean reads were mapped to reference genome using Burrows-Wheeler Alignment tool (BWA) v0.7.17 (Li & Durbin, 2009) with high mapping rates (>99%). Properly paired ratio of samples varies from 97.90% to 98.45 and the effective mapping depth is between 9.6145 × to 12.2982 ×. After alignment, we used Picard v2.25.0 to remove duplicate reads arisen during sample preparation e.g. library construction. We used Genome Analysis Toolkit (GATK) v4.2.0.0 (McKenna et al., 2010) for variant calling and hard-filtering followed different filtering thresholds for SNPs (QD < 2.0, QUAL < 30.0, SOR > 3.0, FS > 60.0, MQ < 40.0, MQRankSum < -12.5, ReadPosRankSum < -8.0) and indels (QD < 2.0, QUAL < 30.0, FS >200.0, ReadPosRankSum < -20.0) separately. The obtained variant calls were annotated by ANNOVAR (Wang et al., 2010) (Table S3).
We used PLINK v1.90 (Purcell et al., 2007) to remove SNPs with a minor allele frequency (MAF) ≤ 0.05 and genotyped in less than 80% samples. We then calculated the pairwise linkage disequilibrium (LD) between the left SNPs and removed one SNP from each pair of neighbor markers whose pairwise r2 is higher than 0.2 using PLINK and the remaining SNPs were used for subsequent analysis. We constructed a maximum-likelihood tree using RAxML-NG v1.0.3 (Kozlov et al. 2019) and the constructed phylogenetic tree was visualized using an online tree visualization tool iTOL v6.4.3 (Letunic & Bork, 2021). Principal component analysis was performed using PLINK and individual ancestries were estimated using ADMIXTURE v1.3.0 (Alexander et al., 2009) with assumed number of populations (K) ranging from 1 to 4.