Genomic distribution of genetic variants
ANNOVAR (Wang, Li, & Hakonarson, 2010) was used to annotate the genomic
distribution of variants and classify them into different categories
(nonsynonymous, synonymous, UTR, 5 kb upstream, 5 kb downstream,
intronic, and intergenic). The filter set for variants was a minor
allele frequency of > 0.01, depth of >3, and
missing rate of < 0.2.
Population diversity and structure
We used VCFtools (Danecek et al., 2011) to estimate nucleotide diversity
(window size of 40 kb) and the F ST divergence
statistic (window size of 40 kb) for each pair of populations. We used
Plink (Purcell et al., 2007)
(http://pngu.mgh.harvard.edu/~purcell/plink/)
to investigate the population structure, with ancestry clusters ranging
from two to six. Principal component analysis (PCA) was performed using
the GCTA software (Yang, Lee, Goddard, & Visscher, 2011). The filtered
SNP dataset was used to generate neighbor-joining trees by using
Treebest-1.9.2. Mantel tests withF ST/(1−F ST) matrixes and
distance matrixes were performed using the ade4 package (Dray & Dufour,
2007) to test for isolation by distance. Geographical distances among
samples were measured by following the coastline (coastal distances) and
by the shortest distance across open waters (oceanic distances).
Demographic analysis
The demographic history of all five populations was analyzed using the
PSMC model as implemented in the PSMC package (Li & Durbin, 2011). In
the absence of mutation rates from S. japonica or any closely
related species, we used a mutation transition matrix based on medaka
data (Spivakov et al., 2014). Point mutation rate and recombination rate
per base were assumed to be 2.5 × 10−8. Generation
time was calculated as g = a + (s / [1 −s ]), where s is the expected adult survival rate, which
is assumed as 80%; and a is sexual maturation age, which is one
year for S. japonica . A generation time of 5 years was used. To
determine the variance in the estimated effective population size, we
performed 100 bootstraps for each population. Population-level admixture
analysis was conducted using the TreeMix v.1.12 program (Pickrell &
Pritchard, 2012). The program inferred the ML tree for five populations.
A window size of 1000 was used to account for linkage disequilibrium
(–k) and “–global” to generate the ML tree. Migration events (–m)
were sequentially added to the tree.
Screening for signatures of selection
To uncover the genetic variants involved in local adaptation of each
population, we calculated the genome-wide distribution ofF ST values and π ratios for four control–thermal
groups. These groups included the cold-temperature population RS versus
the control groups (ZS and ST), the high-temperature population ST
versus the control groups (RS and ZS), the China warm-temperature
population ZS versus the control group (RS), and the Japan
warm-temperature populations (IB and TB) versus the control group (RS).F ST and π for sliding windows were calculated
using VCFtools (Danecek et al., 2011), with a window size of 40 kb and a
step size of 20 kb. The windows with the top 5% values for theF ST and π ratios simultaneously served as the
candidate outliers under strong selective sweeps. All outlier windows
were assigned to their corresponding SNPs and genes. Overlapping genes
within cold-, warm-, and high-temperature groups were selected for
further analysis to avoid false positives. The selected genes in GO
terms and KEGG pathways were enriched using OmicShare tools
(http://www.omicshare.com/tools).
Multiple comparisons were corrected using the FDR method (FDR-adjustedP value <0.05).
Distribution prediction of China and Japan populations under climate
change
We used maximum entropy (Phillips, Anderson, & Schapire, 2006) (Maxent)
model to predict potential distributions of two independent populations
(China and Japan) under changing climates. We retrieved species presence
records from Zhang et al. (2019) and divided them into China (22
recorders) and Japan (23 recorders) groups. On the basis of the results
of Zhang et al. (2019) and the benthic life type of S. japonica ,
we considered five environment predictors from Bio-ORACLE
(http://www.bio-oracle.org),
including ocean depth, distance to shore, mean sea benthic temperature,
salinity, and current velocity. The five predictors were not highly
correlated (i.e., pairwise Pearson’s correlation coefficient
|r | < 0.70) and thus all were used to
develop Maxent models. Future (i.e., RCP 4.5 in 2100s [average for
2090 to 2100]) marine environmental predictors, including temperature,
salinity, and current velocity, were also downloaded from Bio-ORALCE. We
assumed that ocean depth and distance to shore would remain unchanged in
the future. For each group, we developed Maxent model using all presence
data of this group and present-day environmental predictors; this model
was further used to predict potential distributions under present-day
and future climatic conditions.