2.3. SNPs calling and filtering
Raw sequence data from GBS and RADSeq were independently processed as well as merged (COMBINED dataset). Thereafter, the three generated datasets (GBS, RADSeq and COMBINED) were processed through the same pipeline.
All barcoded raw reads were mapped to the P. lilfordi genome (rPodLil1.1, available athttps://denovo.cnag.cat/podarcis(Gomez-Garrido et al., 2023) using the Burrows-Wheeler Aligner (BWA) software (v2.1) to generate individual Binary Alignment Map (BAM) files. The genome corresponds to a single female individual sampled in the Aire population (the population was included in our dataset). Variant calling per sample was performed with Genome Analysis Toolkit HaplotypeCaller from GATK v. 3.7 (Van der Auwera GA & O’Connor BD., 2020) to produce gVCF files (minimum-mapping-quality 20). gVCF files were finally genotyped for variant calling (i.e., SNPs).
Prior to data filtering, SNP and sample statistics were explored using VCFTools v. 0.1.16 (Danecek et al., 2011; De la Cruz and Raska, 2014). SNPs were then filtered using the following settings: –max-alleles 2, –remove-indels, –min-meanDP 10, –max-meanDP 50, –minQ 30, –maf 0.05, –max-missing 0.75 (individual datasets) or 0.90 (combined).
To retrieve common loci between GBS and RADSeq datasets, the max-missing parameter was set to 0.9, thus retaining loci shared by at least 90% of the specimens and effectively removing all loci unique to either GBS or RADSeq.