FST-outlier detection, transcriptome analyses, and evaluations of site frequencies spectra
Genetic distances between groups of specimens were evaluated by means of pairwise F ST (Weir & Cockerham, 1984). The pairwise F ST was calculated in VCFtools v.0.1.16 (Danecek et al., 2011) from all SNPs. Negative values ofF ST were converted to zero and the meanF ST value for each contig/scaffold of the reference genome was calculated as well as the mean and medianF ST across all SNPs. The distribution of contig/scaffold F ST values were visualized using R and the contigs/scaffolds with the top 1% F STvalues were defined as outliers.
Transcriptomes for M. × piperita (Figueroa-Pérez, Reymoso-Camacho, Garcia-Ortega, & Guevara-Conzález, 2018) and M. spicata (Jin et al., 2014) were downloaded and re-assembled. In short, raw reads were cleaned and trimmed in TrimGalore v.0.6.7 (https://zenodo.org/badge/latestdoi/62039322) removing bases with Q < 20 and reads shorter than 50 base pairs or containing any ambiguous base (N). The cleaned reads were then used to assemble transcripts with Trinity v.2.11.00 (Grabherr et al., 2011) with default parameters. The sequences of F ST-outlier contigs/scaffolds were extracted and used as databases for blast-searches of all re-assembled mint transcripts with default settings in BLAST v.2.9.0 (Altschul, Gish, Miller, Myers, & Lipman, 1990; Ye,, McGinnis, & Madden, 2006). Transcripts with top blast-hits longer than 300 bp and with an e-value below 1e-5 were extracted and annotations by extracting the top tblastx-hit (>100 amino acids and e-value < 1e-5) to the UniProt database (The UniProt Consortium, 2021).
Folded site frequency spectra (SFS) were finally calculated in angsd for each morphologically defined group of specimens (see Results) and all SNPs used in the genomic cluster and admixture analyses.