Conflict of interest disclosure
Nothing to declare.
Abstract (243/250 words)
The identification of genetically distinct populations is central to the
management and conservation of wild populations. Whole-genome-sequencing
allows for high-resolution assessment of genetic structure, demographic
connectivity and the impacts of selection acting on different parts of
the genome. Here, we utilise population genomics to investigate the
genetic structure of the Australasian snapper or Tāmure
(Chrysophrys auratus ), an ecologically, economically, and
culturally important (taonga) marine fish. We analysed over four million
high-quality SNPs obtained by whole-genome sequencing from 382
individuals collected across its New Zealand range. We identified two
genetic clusters (an eastern and western cluster) with genetic
disjunctions around on either side of the North Island of New Zealand.
These genetic clusters do not match the current fisheries management
areas. Pairwise-FST and ADMIXTURE analyses showed
the presence of directional gene flow occurring at both genetic
disjunctions from the East to the West cluster. We hypothesize that
major ocean currents are limiting the dispersal of snapper at these
genetic disjunctions. The heterogeneous coastal environment is also
likely driving evolutionary change. A genome scan identified four
significantly divergent genomic regions between genetic clusters. A
diverse pattern of genetic variation in these regions implies that
different evolutionary processes drive local adaptation in these
clusters. Identification of candidate genes in these regions also
provides a tentative connection to which traits may be under
selection. Our results provide
novel insights into New Zealand’s coastal environment influences
evolutionary processes, and valuable information for effective
management of the snapper fisheries.
Keywords: Population Genomics, Whole-Genome-Sequencing, Selection, Gene
Flow, Fisheries Science
Wordcount main text: 6059 (8000 max)
Introduction
The assessment of biodiversity, including the genetic structure observed
within species, is one of the primary goals of population management
(Moritz, 1994). Specifically, the identification of reproductively
discrete units allows management to assess risks and threats, and
implement measures to protect the genetic diversity present within the
species (Palsbøll et al., 2007). Genetic analysis has also proven
invaluable for the identification of biologically significant units
(Allendorf, 2017; Allendorf et al., 2010). However, genetic structure is
often difficult to identify in marine fish due to low levels of genetic
differentiation, a feature that is facilitated by high dispersal
potential, large population sizes, and limited barriers to gene flow
(Cano et al., 2008; Ward, 2000). Such low levels of genetic
differentiation imply that population genetic analyses (< 100
loci) have often lacked the statistical power to detect fine-scale
structure that may exist across a species range (Morin et al., 2009).
Today, the ability to identify thousands of single nucleotide
polymorphisms (SNPs) provides improved statical power compared to
traditional genetic markers, such as microsatellite and mitochondrial
DNA (Luikart et al., 2003; Luikart et al., 2018). Besides increased
statistical power to identify fine-scale genetic structure, genotyping
thousands of SNPs across the genome also allows for the identification
of outlier loci (Allendorf et al., 2010). Such outliers are a strong
indication of selection acting on specific regions of the genome and
provide putative evidence for the adaption of populations to their local
environment (Hohenlohe et al., 2020). The identification and
conservation of adaptive variation is key for the long-term viability of
natural populations, as it provides resilience to environmental
perturbations such as climate change (Sgrò et al., 2011).
Various studies have already shown how population genomics can
significantly improve the ability to detect genetic structure in marine
species. Benestan et al. (2015) used 10,156 SNPs to identify previously
undetected genetic structure in American lobster (Homarus
americanus ). Assignment tests also showed that individuals could be
assigned with high confidence to their respective genetic cluster,
providing a powerful tool monitoring of the species. In Chinook salmon
(Oncorhynchus tshawytscha ), 10,944 SNPs were used to improve the
understanding of their genetic structure in Alaskan waters (Larson et
al., 2014). The identification of 733 outlier loci showed three genomics
under putative selection, which could be used to develop SNP arrays for
low coast genetic monitoring. Additional examples of population genomic
analyses be able to identity previously unknown or unresolved fine-scale
genetic structure in marine species include commercial squid
(Doryteuthis opalescens ), European scallops (Pecten
maximus ), European anchovies (Engraulis encrasicolus ), (Cheng et
al., 2021; Le Moan et al., 2016; Vendrami et al., 2019). These studies
show how genomics can provide a important role in the management of
marine species, and why we should integrate these tools into the
standard approaches to data collection for fisheries management purposes
(Bernatchez et al., 2017).
Australasian snapper or tāmure (Chrysophrys auratus ) is an
important commercial and recreational inshore fisheries in New Zealand
(Parsons et al., 2014), with an total allowable annual catch of 6,400
tonnes (Fisheries New Zealand, 2018). There are thought to be eight
distinct stocks distributed over four quota management areas (SNA; see
left panel Figure 1) (Parsons et al., 2014; Walsh et al., 2011; Walsh et
al., 2012; M. R. Walsh et al., 2006). Three are recognized in SNA1 based
on differences in year-class strength and growth rates (Northland,
Hauraki Gulf and Bay of Plenty) (Walsh et al., 2011). Two stocks in SNA2
on either side of the Mahia Peninsula (Figure 1), showing differences in
year class strength, age structure and growth rates (Walsh et al.,
2012). Snapper north of the Mahia Peninsula showed similar year-class
strength to the Bay of Plenty stock in SNA1 (Walsh et al., 2012),
suggesting demographic connectivity. Additionally, genetic differences
between snapper on either side of the Mahia Peninsula have also been
detected using microsatellite and allozyme data (Bernal-Ramírez et al.,
2003; Smith et al., 1978). Snapper in SNA7 likely consist of two stocks
(Marlborough Sounds and Tasman Bay) (Fisheries New Zealand, 2018).
Finally, SNA8 is thought to consist of a single stock. Snapper form this
area are also genetically distinct from SNA1 (Bernal-Ramírez et al.,
2003; Smith et al., 1978), suggesting the presence of a genetic
disjunction around Cape Reinga. The identification of these genetic
disjunctions at Cape Reinga and Mahia Peninsula shows the presence of an
East and West cluster (Figure 1). This genetic structure was detected
using allozyme and microsatellite data, while mitochondrial Dloop
sequences showed no apparent genetic structure (Bernal-Ramírez et al.,
2003; Smith et al., 1978). This lack of contemporary MT structure is
consistent with recent whole MT genome analyses, which showed that
patterns of MT variation reflect long-term and broad scale evolutionary
processes instead (Oosting et al., 2023). Allozyme and microsatellite
data also showed that Hawke’s Bay was genetically closets to the West
Coast (Bernal-Ramírez et al., 2003; Smith et al., 1978), despite these
sites being located on either side of the North Island (Figure 1). One
of the allozyme loci was genetically linked to the esterase locus (Smith
et al., 1978), of which allele frequencies correlated with fluctuations
in ocean temperature over time (Smith, 1979), suggesting natural
selection could be influencing the genetic structure. However, similar
genetic structure was observed in the selectively neutral microsatellite
loci suggesting historic gene flow or other demographic factors like
ocean current are restricting gene flow between areas (Bernal-Ramírez et
al., 2003). A discrepancy in genetic structure was observed in the
Tasman Bay population. Allozyme data suggested that the Tasman Bay
population is closely related to the West Coast (Smith et al., 1978),
while microsatellite data showed Tasman Bay as a distinct group
(Bernal-Ramírez et al., 2003). Similar environmental conditions were
used to explain the genetic similarity between Tasman Bay and West Coast
(Smith et al., 1978). Bernal-Ramírez et al. (2003) hypothesized that
microsatellite loci have higher sensitivity to recent changes in
demographic connectivity, such as a population size reduction due to
overfishing. This hypothesis is consistent with the loss of genetic
variation in the Tasman Bay population which is thought to be caused by
intense fishing effort (Hauser et al., 2002). An important next step is
to assess whether higher resolution genomic methods can resolve the
genetic structure, potentially to the level that of non-genetic methods.
If relevant SNPs are identified, the development of SNP arrays could
provide a more cost-effective methods for monitoring compared to current
methods (Bernatchez et al., 2017; Montanari et al., 2023).
Here, we analyse 382 whole-genome sequences from snapper samples
collected from 12 locations, spanning most of the geographic range of
this species around New Zealand, to test the genetic stock structure
that have been proposed by lower resolution genetic markers. This
approach could be used as a framework for how genomic markers can be
used for stock structure analyses and to better match the management
stocks with natural reproductive units. First, we assessed levels of
genetic diversity among 12 sample locations and investigate the presence
of adaptive (outlier) SNPs. Second, we test for population genetic
differentiation using neutral SNPs and outlier SNPs. Our results are
compared and discussed in the context the previous genetic studies of
snapper, and we argue how this approach is of value for fisheries and
conservation management.
Materials and Methods
Sampling & DNA
extraction
Fin clips were collected for 382 individuals between 2017 and 2022 from
12 sampling sites (Figure 1 & Table S1). Samples were obtained through
commercial and recreational fishing sources, and no fish were killed
specifically for this study. Samples were preserved in DESS (20% DMSO,
0.5 M EDTA pH 8, saturated NaCl) and stored at -20 °C (Oosting et al.,
2020). DNA was isolated using a high-salt extraction protocol adapted
from Aljanabi and Martinez (1997) (see Appendix 1 for protocol).
Purified DNA was stored in 10 mM Tris-HCl pH 8 and 0.1 mM EDTA, and
stored at -20 °C. The DNA quantity was determined using Qubit broad
range kit (>20 ng/µ) (Invitrogen©),
following manufactures instructions. Sample quality was assessed using
Implen© NP80 spectrophotometer (260/280 <
2.0 and/or 260/230 > 2.0) and gel-electrophoresis
(selecting for samples with high-molecular-weight DNA,
>20kbp fragments).
Whole-genome-sequencing
Library preparation and whole-genome-sequencing for the 370 samples was
performed by the Australian Genome Research Facility (AGRF). Library
preparation was done using the Nextera flex protocol with unique dual
indexing. Sequencing was performed using the Illumina NovaSeq 6000 S4,
generating 150 bp paired-end reads with an average sequencing depth of
~11 x per individual. Reads from an additional 12
individuals were included from a previous study (unpublished). Samples
were sequenced using the Illumina HiSeq 4000, generating 100bp
paired-end reads with an average sequencing depth of ~30
x per individual. Reads for these 12 individuals were generated over two
separate lanes.
Read alignment &
genotyping
Read quality was assessed using FastQC v0.11.7 (Andrews, 2010), and
compiled with MultiQC v1.7 (Ewels et al., 2016). Reads were aligned to
both the nuclear and mitochondrial snapper reference genome using
PALEOMIX v1.2.13.3 using the bam pipeline (Ashton, 2013; Catanach et
al., 2019; Schubert et al., 2014). First, Adapter removal v2.2.3 was
used for trimming of Illumina forward and reverse adapters, allowing a
33% mismatch rate (–mm 3) and a minimum read length of 25 bp. N’s
and low-quality bases were trimmed of the 5’ and 3’ ends.
Burrows-Wheeler Aligner (BWA) v0.7.15 (bwa-mem algorithm) was used to
align reads (Li & Durbin, 2009), using a minimum mapping quality of 25.
MarkDuplicates, Picard tools v2.18.20 was used to remove PCR duplicates
(http://broadinstitute.github.io/picard/), and local realignment was
done using GATK v4.0.8.1 (Van der Auwera & O’Connor, 2020).
Soft-clipped reads were removed using SAMtools v1.9 (Danecek et al.,
2021) and bam files were normalized for depth of coverage, by randomly
subsampling individuals with significantly more data (n = 48) down to
the mean depth using SAMtools view (Danecek et al., 2021). Genotypes
were called using BCFtools (Li, 2011). First, BCFtools functions mpileup
and call were used to genotype and write multiallelic sites to VCF per
linkage group. Missing annotations were added using BCFtools +fill-tags,
and VCF for each linkage group were merged using BCFtools concat.
SNP
filtering
First, we extracted all biallelic SNPs (–remove-indels
–max-alleles 2 –min-alleles 2 ), and setting all genotypes with an
allelic depth below 3 (–minDP 3) to missing using VCFtools v0.1.16
(Danecek et al., 2011). Subsequently, low quality SNPs were filtered
using VCFtools (–minQ 600 –max-missing 0.95 –maf 0.01
–min-meanDP 8 –max-meanDP 25). A custom script adapted from Pinsky
et al. (2021) was used to remove heterozygote SNPs subjected to allelic
imbalance. In short, the script performs a two-sided binomial test to
assess whether the reference and alternative alleles occur in equal
proportions at heterozygous sites. SNPs significantly deviating from
equal proportions were excluded from the data set. The retained
high-quality SNPs were used to identify outlier and neutral loci.
First, outliers were identified using OutFlank (Whitlock & Lotterhos,
2015) and pcadapt (Luu et al., 2017). OutFlank was run using a minimum
heterozygosity of 0.1 (Hmin 0.1), and LeftTrimFraction and
RightTrimFraction fraction of 0.05 according to the author’s
recommendations (Whitlock & Lotterhos, 2015). Loci with a false
discovery threshold below 1% (q < 0.01) were considered
outliers. For pcadapt, we first ran the program with 10 principal
components (K = 4) and used a screeplot to determine the number of
informative principal components using Cattell’s rule (K = 1) (Figure
S1). Outliers were identified for K=1 informative principal components,
using a minimum allele frequency of 0.05 (min.maf 0.05). Again, loci
with a false discovery threshold below 1% (q < 0.01) we
considered outliers. Overlapping outliers between OutFlank and pcadapt
were retained, and subsequently thinned using non-overlapping
sliding-window of 50,000 bp to obtain independently segregating sites.
For each sliding-window, we retained the locus with highestFST estimated by OutFlank.
A neutral SNP dataset was obtained by first removing all outlier loci
(OutFlank and pcadapt). Retained SNPs were then filtered for deviations
from Hardy-Weinberg equilibrium (–hwe 0.05), and thinned (–thin
5000) using VCFtools v0.1.16 (Danecek et al., 2011). Finally, linked
loci were removed using Plink (–indep-pairwise 50 5 0.2) (Purcell et
al., 2007).
Population structure
First, PCAs were performed on the high-quality, neutral and outlier loci
using SNPRelate (Zheng et al., 2012), and visualised using a custom
script from Clucas et al. (2019). Second, ADMIXTURE was run for K
populations between 1 and 4 (Alexander et al., 2009) using 10,000
iterations. The number of K populations that best explain the structure
in the data set was evaluated using cross validation selecting K with
the lowest error. To investigate the levels of gene flow between the
genetic clusters we used the output from ADMIXTURE, and method described
by Robinet et al. (2020),
https://github.com/tonyrobinet/introgression. The analysis
estimated the level of ancestry for each individual to one of the K
ancestral populations.
Global and pairwise levels of population differentiation were estimated
by calculating FST from all SNPs using SNPRelate
(Zheng et al., 2012). FST values for each SNP
were estimated using the snpgdsFST function, estimating weightedFST values using the Weir & Cockerham, 1984
method (Weir & Cockerham, 1984), results were visualised using the
R-package pheatmap (Kolde & Kolde, 2015). A mantel test was used to
test for isolation by distance (IBD) between sample locations using the
R-package ade4 (Thioulouse et al., 2018). A distance matrix was obtained
by calculating the least-cost distance using the R-package gdistance
(van Etten, 2017). Locations were determined by taking the mean
longitudes and latitudes of all samples for each sample location. In
locations where GPS data was not available, an arbitrary point
representing the area was chosen (i.e. Gisborne and Hawkes Bay).
Genome scanTo test for evidence of selection we performed a genome scan using a
5,000bp non-overlapping window. Estimates for nucleotide diversity (π),
relative population divergence (FST ), and
absolute population divergence (dxy ) were obtained using pixy
v1.2.7.beta1 (Korunes & Samuk). AllSites VCF files were generated for
each linkage group using BCFtools mpileup and call (-m) (Li & Durbin,
2009). Genotypes with a sequencing depth below three (DP<3),
and genotype quality below 20 (GQ<20) were set to missing
using BCFtools filter. Subsequently, BCFtools view was used to remove
indels and multiallelic sites (STRLEN(REF)!=1 | STRLEN(ALT)
>=2), and biallelic sites were filtered for quality below
600 and mean coverage between 8 and 25X (QUAL<600 |
AVG(FMT/DP)<8 | AVG(FMT/DP)>25).
Resulting AllSites VCF files were tab indexed using HTSlib v1.9 (tabix)
and run in pixy (Bonfield et al., 2021). Estimates for Tajima’s D
(DT & ΔDT) were estimated using
VCFtools v0.1.16 (–TajimaD) (Danecek et al., 2011), using all
high-quality biallelic SNPs as described in the SNP filtering paragraph.
Manhattan plots were generated in R (R Core Team, 2013). Regions showing
significant levels of relative population divergence
(FST > 0.15 ) were further
investigated. Genes located in regions of interest were extracted from
the genome annotation produced by Catanach et al. (2019).
Genetic diversityLevels of heterozygosity was estimated for each individual, sample
location, and genetic cluster using all three SNP datasets (i.e.
high-quality, neutral and outlier SNPs). An ANOVA to test for
significant differences in heterozygosity between sample locations and
genetic clusters using R (R Core Team, 2013). An hierarchical analysis
of molecular variance (AMOVA) was performed using the R-package poppr
(Kamvar et al., 2014). The hierarchical model was applied to compare the
variation between identified genetic clusters, fisheries quota
management areas and sample locations. Significance levels were tested
using randtest from R-package adegenet, using 999 permutations.
Results
Sequencing and SNP
filtering
In total, 14.6 billion reads were generated over 382 individuals, with
an average of 38.2 million reads per individual. Sequencing depth ranged
between 7.17-60.1x, with an average of 10.8x per individual. After read
alignment, bam files of 48 individuals with significantly more reads
compared to the average were randomly subsampled to remove bias between
diversity estimates (Figure S2). BCFtools genotyped 11.4 million SNPs,
and 4,185,623 SNPs were retained after removing indels and low-quality
SNPs. High-quality SNPs showed a missing rate of 0.8% and were used for
outlier selection. OutFlank identified 3,843 outlier loci using a false
discovery rate of 1% (q = 0.01). Pcadapt identified 6,008
outlier loci using a false discovery rate of 1% (q = 0.01).
Combined, outlier analyses identified 6,448 unique outliers, of which
3,403 were overlapping. The final outlier dataset consisted of 278
independently segregating loci after thinning. The neutral dataset
consisted of 69,700 independently segregating loci after removing all
6,448 outlier loci, non-independently segregating sites (thinning &
loci in Linkage Disequilibrium), and sites deviating from
Hardy-Weinberg. Missing rate among neutral SNPs was 0.9%.
Population
structure
PCAs from all data three sets were consistent and showed the presence of
two genetic clusters (Figure 2A & Figure S3). The “East” cluster
(including Northland, Hauraki, Bay of Plenty ands East Cape) and
“West” cluster (including Gisborne, Hawke’s Bay, Marlborough Sounds,
Tasman Bay, Karamea Bight, Kapiti Coast, and West Coast) show genetic
disjunctions around Tauroa Peninsula and East Cape. Only the first
principal component (PC1) of the PCA was informative for inferring
genetic structure and explained a small proportion of the total amount
of variation present in the neutral data (0.59%). ADMIXTURE also
supported the presence of two genetic clusters with gene flow (K = 2)
(Figure 2B and Figure S4). Both analyses showed movement of individuals
and presence of gene flow between genetic clusters, which was primarily
observed in West Coast, Cape Reinga, Hawke’s Bay and Gisborne. Further
analyses showed differential gene flow was occurring from the East to
the West cluster as migrant or admixed individuals were only identified
in the West cluster (Figure 2C). The higher number of private alleles
present in the East cluster compared to the West cluster also supports
that gene flow is stronger from East to West (1,846 & 1,351,
respectively among all high-quality SNPs). Global genetic
differentiation (FST ) was 0.0021, while genetic
differentiation (FST ) between the East and West
cluster was 0.0031. Pairwise FST estimates
between sample locations ranged between 0.00003 and 0.00461 (Figure 2D).
Within genetic clusters, sample locations showed low levels of genetic
differentiation (FST = 0.00003 – 0.00086).
Mixing of individuals and presence of differential gene flow resulted
moderate levels of pairwise genetic differences in Gisborne, Hawke’s Bay
and Cape Reinga. A significant correlation (r2 =
0.6917, p = 4.99-9) was observed between
genetic and geographic distance (Figure S5), showing the presence of
IBD. A significant mantel test also suggested the presence of IBD
(p = 0.001). We did observe low genetic differentiation between
the geographically separated sample locations Hawke’s Bay and West Coast
(FST = 0.005).
Genome scan
The genome scan showed heterogeneous levels of differentiation between
genetic clusters across the genome (Figure 3). Linkage groups 1, 7, 8,
and 10 had regions of high relative genetic differentiation
(FST > 0.15) (Figure 4). The region
of interest on LG1 spanned 90 Kbp and showed a small increase in
absolute genetic divergence (dxy ) and contained a
coding region for gypsy retrotransposon integrase-like protein 1 (gin1).
The of high relative divergence (FST ) on LG7
spanned 305 Kbp but did not show an increase in absolute divergence
(dxy ). This region did show difference in
Tajima’s D with the West cluster having elevated levels suggesting a
lack of rare alleles. The region contained the coding for
microtubule-associated serine/threonine-protein kinase 2 (mast2). The
region of high relative genetic divergence on LG8 spanned 30 Kbp and
showed a small increase in absolute genetic divergence compared to the
surrounding region. The region contained the coding region for cyclic
amp-dependent transcription factor atf-2 (atf2). Finally, regions of
interest on LG10 spanned 35 Kbp and showed the highest level of relative
genetic divergence between the East and West cluster. Absolute genetic
divergence (dxy ) was also high within that
region, and the West cluster showed high nucleotide diversity and lack
of rare alleles (Tajima’s D > 2). The region contained
multiple genes, but hexokinase-2 (hk2) was identified to be relevant for
our discussion.
Genetic
diversity
Analyses of molecular variance (AMOVA) showed that the majority of
variation is observed within individuals (Table 2), 99 and 84 percent
for neutral and outlier loci respectively. Only 0.28% of neutral
variation was observed between genetic clusters (p = 0.194), in
contrast to 12.27% of variation in outlier loci (p = 0.001).
Variation between quota management areas (QMAs) within the same genetic
cluster was 0.01 and 0.52% for neutral and outlier loci respectively.
Similar, variation between sample locations within the QMAs was 0.01 and
0.73% for neutral and outlier loci respectively. Levels of
heterozygosity varied between datasets (Table 1). Mean heterozygosity
was lowest among high-quality loci (0.215), with neutral and outlier
loci having a mean heterozygosity of 0.301 and 0.372 respectively.
Heterozygosity also varied between sample locations, rangin between
0.213-0.218, 0.298-0.304, and 0.359-0.393 for high-quality, neutral, and
outlier loci respectively. Significant differences in heterozygosity
were observed between 12 sample location comparisons (Figure S6A). Here,
Northland and East Cape had significant higher levels of heterozygosity
than Gisborne, Karamea Bight, Kapiti Coast, Marldborough Sounds, and
Tasman Bay (among high-quality and neutral loci). These significant
differences in heterozygosity were not observed among outlier loci
(Figure S6B). Heterozygosity was significnatly different between genetic
clusters (Figure 5), with heterozygocity among high-quality and neutral
loci being higher in the East cluster (9.8-10 and
8.0-9 respectievly), while hetrozygosity among outlier
loci was significnatly higher in the West cluster (p =
4.1-5).
Discussion
The goal of this study was to assess the genetic structure and genetic
variation of snapper across their entire distribution around New
Zealand. We identified the presence of two genetic clusters with genetic
disjunctions around Tauroa Peninsula and the East Cape, and directional
gene flow occurring from the East to the West cluster. Levels of
heterozygosity among high-quality and neutral loci were higher in the
East Cluster, while heterozygosity among outlier loci was higher in the
West Cluster. A genome scan showed genetically differentiated regions
between the two clusters. Below we will discuss how these results have
improved our understanding of the genetic structure in snapper and the
potential implications for fisheries management.
Genetic structure
Our results provide unequivocal support for the presence of two genetic
clusters (Figure 2), with genetic disjunctions around Tauroa Peninsula
and East Cape (Figure 6), that were suggested by microsatellite and
alloenzyme data (Bernal-Ramírez et al., 2003; Smith et al., 1978). These
genetic disjunctions also correspond with the boundaries of proposed
bioregions (Shears et al., 2008), showing the genetic structure of the
snapper matches that of general pattens of biodiversity. For example,
genetic disjunctions in these areas were also identified for tarakihi
(Cheilodactylus macropterus ), barracouta (Thyrisites
atun ), jack mackerel (Trachurus declivis ) (Gauldie & Johnston,
1980; Papa et al., 2020), waratah anemone (Actinia tenebrosa )
(Veale & Lavery, 2012), and New Zealand Estuarine Clam
(Austrovenus stutchburyi ) (Ross et al., 2012). Restricted gene
flow at the genetic disjunctions resulted in low genetic differentiation
in snapper between the East and West cluster (FST= 0.0031). This level of genetic divergence is similar to those reported
for snapper (C. auratus ) along the South and West Coast of
Australia (FST = 0.0041) (Bertram et al., 2022)
and other species of fish, such as Atlantic cod (Gadus morhua;
FST = 0.0037 ), blue whiting
(Micromesistius australis; FST = 0.006), and
herring (Clupea harengus; FST = 0.012) (Guo et
al., 2016; Knutsen et al., 2011; McKeown et al., 2017).
Ocean currents likely play a dominant role in the direction of movement
around these genetic disjunctions. At Cape Reinga, the Tasman Flow hits
the North Island of New Zealand and splits up into the East and West
Auckland current (Figure 6) (Bull et al., 2018; Papa et al., 2020).
Around the East Cape, the East Coast current originates from the East
Auckland current and flows around the East Cape southward (Figure 6).
Mixing of individuals from both clusters was observed at Cape Reinga and
Gisborne (Figure 2A), and resulted in intermediate pairwiseFST estimates (Figure 2D). While dispersal of
individuals across the genetic disjunctions was observed in both
directions, individuals that showed intermediate levels of ancestry to
both genetic clusters were only present south of the genetic
disjunctions (Figure 2B&C), suggesting that gene flow is primarily
occurring from the East to the West clusters. Snapper around Gisborne
also have similar life history traits to those in the Bay of Plenty
(Parsons et al., 2014; Walsh et al., 2012), suggesting that Gisborne is
demographically connected to the Bay of Plenty. Similar levels of
demographic connectivity have not been reported between sample locations
on either side of Tauroa Peninsula, and are through to constitute of two
distinct stocks (Parsons et al., 2014; C. Walsh et al., 2006). The area
between Cape Reinga and Tauroa peninsula was also identified as the most
prominent genetic break for marine species around the North Island of
New Zealand (Arranz et al., 2021).
While SNPs were unable to identify such fine-scale genetic structure,
structural variation in snapper may provide additional resolution and
statistical power to detect genetic differences between different
stocks. In other species, structural variants have provided much more
detailed population differentiation compared to SNPs (Dorant et al.,
2020). The amount of structural variation in snapper is known to
outweigh the number of SNPs threefold (Catanach et al., 2019).
Structural variation also plays an important function in adaptation to
local environments (Hoban et al., 2016), both copy number variation
(Dorant et al., 2020; Perry et al., 2007), and inversions (Berg et al.,
2016; Star et al., 2017). Future studies utilizing structural variation
will hopefully continue to improve the fine-scale structure of snapper.
Genetic variation
The genetic clusters also showed significant differences in levels of
diversity. Genetic variation was primarily observed within individuals,
showing 99% and 84% for neutral and outlier loci respectively (Table
2). Such levels of within species diversity are typical for a marine
species with high potential for dispersal are large effective population
sizes (Cano et al., 2008; Palumbi, 2003; Ward, 2000). Less than 1% of
genetic variation in neutral and outlier loci was observed between
sample locations or quota management areas (QMAs), which shows high
levels of genetic connectivity between fisheries management areas
(Fisheries New Zealand, 2018; Parsons et al., 2014). However, 12.27% of
variation among outlier loci was observed between genetic clusters,
suggesting environmental differences are driving local adaptation
(discussed below).
Heterozygosity estimates provided new insights into the diversity
contained within each of the genetic clusters. Heterozygosity was
significantly higher in the East cluster for high-quality and neutral
loci (Figure 5). Similarly, significant differences in heterozygosity
were observed among neutral loci between sample locations from
respective genetic clusters (Figure S6A). This shows that variation is
primarily observed between clusters rather than individual sample
locations which closely resemble the recognized stocks by fisheries
management (Parsons et al., 2014). Higher levels of genetic variation in
the East cluster is consistent with the (historic) larger stock sizes in
fisheries management area SNA1 which makes up the majority of the East
genetic cluster (Fisheries New Zealand, 2018). In contrast,
heterozygosity among outlier loci was significantly higher for the West
cluster (Figure 5). We hypothesize that higher diversity among outlier
loci is because the West cluster is spread over a more heterogenous
environment. Moreover, the West cluster has a larger latitudinal range,
with more prominent environmental gradients compared to the East cluster
(Figure S7). The genomic regions attributing to the higher levels of
variation in outlier loci in the West cluster also correlate with the
regions that show high levels of genetic differentiation between the two
clusters (Figure 3).
Heterozygosity is a key indicator of overall diversity and often
used to assess changes in diversity over time or between a
populations/species (Chapman et al., 2009; Hauser et al., 2002).
However, the use of different genetic markers and variation in
sequencing depth or SNP filtering parameters can make difficult to
compare results between studies. Within this study alone, sequencing
depth and the level of SNP filtering resulted in range of heterozygosity
estimates (Table 1, Figure S9). For accurate comparisons, sequencing
depths should be normalized, or high enough that fluctuations in depth
no longer significantly affect heterozygosity levels among individuals.
Neutral loci would be the most appropriate genetic marker for population
or species comparisons as they are not influenced by selection, and thus
best reflect the demographic processes (e.g. population size, migration)
(Charlesworth et al., 2003). Previous studies on snapper utilized
microsatellites to assess levels of genetic variation (Ashton, 2013;
Hauser et al., 2002; Le Port et al., 2017). Heterozygosity estimates
reported in these studies ranged between 0.521 and 0.938. In contrast,
heterozygosity estimates in this study ranged between 0.215 and 0.372
(Table 1). High levels of heterozygosity among microsatellite loci are
common because they are often selected based on the number of alleles.
This form of ascertainment bias implies estimates of genetic variation
are inflated (Brandström & Ellegren, 2008; Väli et al., 2008), and
incompatible with heterozygosity estimates based on SNP data. For
example, Hauser et al. (2002) reported a reduction in heterozygosity in
Tasman Bay snapper between 1950 and 1989, suggesting loss of variation
due to overexploitation. However, we did not detect consistent
differences in heterozygosity between Tasman Bay and other sample
locations that would suggest this population has suffered from loss of
genetic variation due to exploitation (Figure S6). This could imply that
either the Tasman Bay population has since recovered from exploitation,
or our data was unable to detect the reduction in heterozygosity. It is
possible that microsatellites could also be more sensitive to
demographic changes because of the high number of low frequency alleles
compared to SNPs. While this implies that microsatellites are better at
detecting fluctuations in heterozygosity is thus reduction in diversity,
it’s important to question whether such changes are relevant for the
overall health of a population. Recent studies using genomic data have
suggested that populations are able to maintain genetic variation
throughout short periods of heavy exploitation (Therkildsen et al.,
2010). Most notably, Pinsky et al. (2021) showed there was no loss of
genetic variation in the heavily exploited North Atlantic cod. Like
Atlantic cod, the period of exploitation for snapper was relatively
short. Such observations are encouraging as it may imply that the
species will have no immediate long-term loss of diversity due to
fishing.
Putative evidence for
adaptation
The heterogeneous environment of New Zealand’s coastal ecosystems is
likely leading to local adaptions in snapper. We found that 12.27% of
the observed variation among outlier loci was identified between genetic
clusters (Table 2), and the genome scan showed regions of high genetic
differentiation (FST ) between genetic clusters
(Figure 3). Genomic regions on linkage groups 1, 7, 8 and 10 showing
strong relative genetic differentiation (FST> 0.15) were of particular interest (Figure 4). Other
summary statistics were utilized to investigate which evolutionary
processes could be underlying to the observed genetic differentiation.
Studies have emphasized the distinction between relative
(FST ) and absolute (dxy )
measures of genetic differentiation and the implications regarding the
evolutionary processes that are driving genetic differentiation
(Cruickshank & Hahn, 2014; Delmore et al., 2018; Nachman & Payseur,
2012). Regions of interest were selected for high relative divergence
(FST ). However, it is unclear whether this is due
to 1) a reduction in gene flow between the two clusters, 2) or local
selection in one of the clusters is affecting the within-population
diversity. Both mechanisms result in an increase inFST , but only the first mechanism is associated
with the process of speciation (Cruickshank & Hahn, 2014). Absolute
genetic divergence (dxy ) which only looks at
fixed differences between clusters can be used to distinguish between
these processes. If gene flow is reduced due to selection, we would
expect the number of fixed differences between the two clusters to
increase.
Overall, absolute genetic divergence (dxy ) was
not elevated in regions of high relative divergence
(FST ) (Figure 3), suggesting that gene flow is
unrestricted in these regions. Regions of interest on LG1 and LG10 did
show high relative and absolute genetic divergence (Figure 4). The
region of high absolute genetic divergence on LG1 is restricted to a
single 5 Kbp window containing the coding regions for gypsy
retrotransposon integrase-like protein 1 (gin1). Transposons are
associated with high mutation rates (Wicker et al., 2016), and likely to
induce population specific mutations inflating absolute divergence
(dxy ). Absolute divergence in the region of
interest on LG10 is more distinct, showing increaseddxy across multiple windows. Following the logic
of Cruickshank and Hahn (2014), this region shows support for restricted
gene flow due selection which is associated with process of “speciation
with migration”. However, it is unlikely that the two genetic clusters
are currently diverging into fully diverged populations due to the
isolated nature of this distinct pattern on divergence.
We hypothesize that the regions of interest are associated with local
adaptation in one of the genetic clusters (Figure 4). In this scenario,
local selection effects with-population variation causing an increase in
relative divergence (FST ) (Cruickshank & Hahn,
2014). Evidence for local selection was strongest in regions of interest
on LG7 and LG10, suggesting the presence of selection in the West
cluster. In both regions, elevated Tajima’s D (> 2) in the
West cluster indicates the lack of rare alleles and higher levels of
heterozygosity than expected if selectively neutral. This observation is
consistent with mean level heterozygosity being higher in the West
cluster compared to the East cluster (Figure 5). It is possible that
multiple alleles are maintained in the West Cluster because it is
distributed over a large geographical area with environmental gradients
in ocean temperature, oxygen concentration, pH along a latitudinal axis
(Figure 6 & Figure S7). Ocean temperature has already been
associated with the observed genetic differences between the East and
West Clusters (Smith, 1979; Smith et al., 1978). An allozyme locus
linked to the est-4 gene was shown to have the same genetic
structure as reported in this study. The coding regions for est-4was not identified within the regions of interest in this study.
However, coding regions for genes in the regions of interest did show a
tentative connection between adaptation to ocean temperature, based on
the assumption that ocean temperature effects the metabolic rate and
growth rates of fish (Boscolo-Galazzo et al., 2018; Johansen & Jones,
2011). On LG7, mast2 is linked to the expression of epidermal
growth factor (Ahn et al., 1991). On LG8, atf2 is a transcription
factor that is activated by stress-activated protein kinases likep38 and involved in a wide-range of cellular functions including( Zarubin & Han, 2005) . In drosophila , the homolog
for atf2 (dATF-2 ) is critical for regulation of fat
metabolism, and in mice inhabitation of the p38-Atf-2 pathway was shown
to reduce white adipose tissue (Maekawa et al., 2010). In humans,atf2 is thought to may be critical for glucose metabolism and
tumorigenesis (Zhao et al., 2019). On LG10, hk2 codes for a rate
limiting enzyme during the first step of glycolysis (Li et al., 2020),
and has been shown to play an important role in cell growth (Wellen et
al., 2010). The connection between the identified genes within the
regions of genetic divergence and their association with local
adaptation is highly speculative and requires extensive research that
goes beyond the scope of the manuscript. Genotype by environment
analyses will be conducted to investigate which environmental factors
may be attributed to observed patterns of genetic divergence between the
two clusters.
Conclusion
This study utilised whole-genome-sequencing data to investigate the
genetic structure and demographic connectivity of Australasian snapper
in New Zealand to inform management decisions. Our results provided
conclusive evidence for the presence of two distinct genetic clusters
with genetic disjunctions around the Tauroa Peninsula and East Cape.
Estimates of genetic differentiation and gene flow suggest dispersal
primarily occurs from the East to the West cluster, with ocean currents
likely playing a key role in their ability to disperse across the
genetic disjunctions. Genomic regions of high relative genetic
divergence provide tentative evidence for adaptation to local
environments. Fisheries management agencies should utilize the genetic
structure to refine management areas to key fisheries species
(Bernatchez et al., 2017). Currently, quota management areas do not
match with the genetic disjunctions observed in snapper and other
fisheries species (Figure 6). Ideally, genomics would be able to
identify genetic structure that corresponds to the demographically
independent stocks based on year-class strength and growth rates
(Parsons et al., 2014; Walsh, 2003; Walsh et al., 2011). Combined with
non-genetic methods, genomics can significantly contribute to the
management of the snapper fisheries.