2.4.3 Identification of biological function of outlier genes
A gene ontology (GO) set enrichment analysis (GSEA) was performed to associate biological and common gene functions to the outlier SNPs and outlier 5 kb windows. This analysis tests if multiple genes can be linked with certain GO functions. First, functional annotation of the brown trout reference was performed on the EggNOG v5.0 web-interface (http://eggnog-mapper.embl.de/; Huerta-Cepas et al., 2018) using brown trout protein FASTA files from Ensembl (https://www.ensembl.org/Salmo_trutta/Info/Index). Second, we converted the NCBI S. trutta annotation release 100 from GFF to bed format using bedtools v2.27.1 (Quinlan & Hall, 2010), including only CDS regions, and extracted all genes that overlapped with the identified outlier SNPs and windows using bedtool’s (Quinlan & Hall, 2010) ‘intersect’ function. Third, the R package topGO (Alexa & Rahnenfuhrer, 2020) was used to test for over-representation of GO biological processes using a node size of 5 and the ‘classic’ algorithm to account for structural relationships among GO terms. Finally, GO terms with p-values ≤0.01 (i.e., enriched GO terms) were retained and then filtered in REVIGO (Supek, Bošnjak, Škunca, & Šmuc, 2011) to remove redundant GO terms. Treemaps were drawn in R (Supek et al., 2011) to visualize enriched and non-redundant GO superclusters.
2.4.4 Locating LDH-A genes in the brown trout reference assembly The sympatric populations of Lakes Bunnersjöarna were identified with a single allozyme locus, LDH-1 , that showed contrasting homozygosity in the demes. LDH-1 is one of two loci (LDH-1and LDH-2 ) in salmonids coding for LDH-A (Allendorf et al., 1984), and we identified the DNA sequences relating to this enzyme. First, all genes related to LDH (EC number 1.1.1.27, GO:0004457) were identified using the GO term protein annotations from EggNOG (section 2.4.3.). Second, to ensure no gene copy was missed, we used TBLASTN (Altschul et al., 1990) with the FASTA sequence for LDH extracted from the Uniprot database to search the genome assembly, extracting hits with E- values<0.0001 and bitscore>60. Finally, each LDH-A copy was manually inspected for exon boundaries, reading frame and mapping performance by loading the brown trout reference assembly and the bam files generated with the Pool-seq and individual whole-genome sequencing pipelines (see below for the latter) into the Integrative Genomics Viewer (IGV) v2.4.2 (Robinson et al., 2011). In both cases, the sequencing reads had only been filtered for minimum base quality 20 and for being mapped as pairs, i.e. without depth or additional quality filtering, to ensure that reads mapping to the LDH-A loci had not been removed. The amino acid compositions for all LDH genes were extracted using an in-house script (Appendix S4).
We calculated allele frequencies for the LDH-A gene copies in both demes from the Pool-seq data (gene-based sync files; section 2.4.1.) using ‘snp-frequency-diff.pl’ from Popoolation2 (Kofler, Pandey, et al., 2011). Sequence variants were called for these genes on 14 of the individual WGS data using bcftools call (Danecek, McCarthy, & Li, 2015), inspected using IGV, and visualized with Gviz (Hahne & Ivanek, 2016) and ggplot2 in R (Wickham, 2016).

2.5 Individual whole-genome sequencing data processing and variant calling

Sequenced reads from two individuals per deme were aligned against the brown trout reference assembly using BWA mem v0.7.17 (BWA; Li & Durbin, 2009), and sorted using SAMtools v1.8 (Li et al., 2009). No data was filtered before mapping sequences against the reference genome (unlike the pipeline for Pool-seq data). Alignment generated two bam files for each individual (one per lane), which were merged per individual using PICARD v2.10.3 MarkDuplicates (https://broadinstitute.github.io/picard/). PCR duplicates were also marked using PICARD v2.10.3. The quality of the obtained bam files was assessed with Qualimap v2.2.1 (García-Alcalde et al., 2012).
Individual genomic variant call format files (gVCFs) were generated with HaplotypeCaller from the Genome Analysis ToolKit (GATK) v3.8 (McKenna et al., 2010), and joint genotyping of all brown trout samples (including fish from other lakes, see 2.6) was performed with GATK GenotypeGVCFs. To filter out low quality variants, we applied a hard filter approach, using GATK’s VariantFiltration tool, separately for SNPs (QD < 2.0, MQ < 40.0, FS > 10.0, MQRankSum < -5.0, ReadPosRankSum < -5.0, SOR > 5.0) and indels (QD < 2.0, FS > 10.0, ReadPosRankSum < -5.0, SOR > 5.0). VCFtools v0.1.15 (Danecek et al., 2011) was used to retain only bi-allelic SNPs with a minor allele frequency ≥ 0.01. Further, variants from the un-assigned scaffolds were removed. We used BCFtools v1.8 (Li et al., 2009) to annotate the ID field of the joint VCF file.