Population genetic structure and differentiation
DNA was extracted from liver sample of 22 male tree sparrow, a half from
BY and the other from LJX, using the Qiagen DNeasy Blood and Tissue Kit.
DNA concentration (minimum of 80 ng/µL) was measured using Qubit DNA
Assay Kit in Qubit 2.0 Flurometer (Life Technologies, CA, USA). After
sample quality control, the qualified DNA samples are randomly
fragmented by Covaris and the fragments are collected by magnetic beads.
Adenine are added to 3’end of end-repaired DNA fragments before adaptor
ligation. The ligation products are then cyclized and then amplified by
linear isothermal Rolling-Circle Replication and DNA NanoBall
technology. Sequencing of these DNA libraries is performed with BGISEQ
sequencing platform.
SOAPnuke v2.1.0 (Chen et al., 2018) was used to remove any remaining
adapter and trim low quality reads. The clean reads were mapped to
reference genome using Burrows-Wheeler Alignment tool (BWA) v0.7.17 (Li
& Durbin, 2009) with high mapping rates (>99%). Properly
paired ratio of samples varies from 97.90% to 98.45 and the effective
mapping depth is between 9.6145 × to 12.2982 ×. After alignment, we used
Picard v2.25.0 to remove duplicate reads arisen during sample
preparation e.g. library construction. We used Genome Analysis Toolkit
(GATK) v4.2.0.0 (McKenna et al., 2010) for variant calling and
hard-filtering followed different filtering thresholds for SNPs (QD
< 2.0, QUAL < 30.0, SOR > 3.0, FS
> 60.0, MQ < 40.0, MQRankSum < -12.5,
ReadPosRankSum < -8.0) and indels (QD < 2.0, QUAL
< 30.0, FS >200.0, ReadPosRankSum <
-20.0) separately. The obtained variant calls were annotated by ANNOVAR
(Wang et al., 2010) (Table S3).
We used PLINK v1.90 (Purcell et al., 2007) to remove SNPs with a minor
allele frequency (MAF) ≤ 0.05 and genotyped in less than 80% samples.
We then calculated the pairwise linkage disequilibrium (LD) between the
left SNPs and removed one SNP from each pair of neighbor markers whose
pairwise r2 is higher than 0.2 using PLINK and the
remaining SNPs were used for subsequent analysis. We constructed a
maximum-likelihood tree using RAxML-NG v1.0.3 (Kozlov et al. 2019) and
the constructed phylogenetic tree was visualized using an online tree
visualization tool iTOL v6.4.3 (Letunic & Bork, 2021). Principal
component analysis was performed using PLINK and individual ancestries
were estimated using ADMIXTURE v1.3.0 (Alexander et al., 2009) with
assumed number of populations (K) ranging from 1 to 4.