ddRAD genotyping and SNP filtering
DNA was extracted using a modified CTAB method (Milligan, 1992). The DNA samples were quantified using a Qubit 2.0 Fluorometer (Invitrogen, MA, USA) and adjusted to 12.6 ng/μl through dilution with TE buffer. Sequencing libraries were prepared following a modified version of Peterson’s protocol for ddRAD-seq (Peterson et al., 2012). For detailed library preparation methods, refer to appendix 1. The libraries were sequenced using a HiSeq2000 platform (Illumina, CA, USA) with 51 bp single-end reads at BGI Japan (Kobe, Japan).
SNPs were detected using dDocent (Puritz et al., 2014) and Stacks (Catchen et al., 2013; Catchen et al., 2011), resulting in three datasets: denovo, referenced, and demography datasets. The detection conditions and number of SNPs used in each analysis are summarized in Table S1. In all data sets, we excluded five individuals with low individual-level genotyping rates from SNP detection. In the referenced and denovo datasets, SNPs were detected using dDocent, following its tutorial. When using the denovo dataset, the reference genome ofC. subpubescens was not used as a reference sequence and the two outgroup individuals were not included. In contrast, when using the referenced dataset, the reference genome and two outgroup individuals were used. Total raw SNPs generated via dDocent were filtered using vcftools -0.1.14 to meet the conditions outlined in Table S1. For the demography dataset, SNPs were re-extracted from the .bam files created for the referenced dataset using dDocent. First, gstacks from Stacks version 2.60 was used to generate catalogs of variable sites (Rochette et al., 2019). Subsequently, populations from Stacks were employed to extract SNPs with the following options: -r 0.8 -p X –min-mac 1 –max-obs-het 0.5 –vcf (where X represents the number of species/ecotypes in each dataset). The pairwise two-dimensional minor allele site frequency spectrum (2D-mSFS) was calculated from the .vcf file using the R script 2D-msfs-R (https://github.com/garageit46/2D-msfs-R). Missing data were addressed through bootstrapping within the same ecotype.