Method
We screened the NCBI Influenza Virus Resource and downloaded all HA and NA nucleotide sequences of influenza H1N1 available up to December 25, 2020 from Iran and its neighboring countries (i.e., Pakistan, Afghanistan, Armenia, Turkmenistan, Azerbaijan, Turkey and Iraq). We captured sampling date and location of sequences by extracting information deposited in the NCBI Influenza Virus Resource. In our final genome sets, a total of 398 HA and 339 NA sequences remained after the exclusion of identical sequences and sequences with unknown sampling time and location. By using the ClustalW algorithm in MEGA v7.0, multiple sequence alignment was performed and manually inspected. The dataset was trimmed for phylogenetic analysis.6 The initial maximum likelihood (ML) phylogenetic analysis was conducted using Hasegawa- Kishino-Yano (HKY) and General Time Reversible (GTR) nucleotide substitution models for HA and NA, respectively. The best-fitting nucleotide substitution models were identified using an online execution on the ATGC bioinformatics platform (http://www.atgc-montpelier. fr/). Analysis was performed with 1000 bootstraps in MEGA v7.0.6
To infer the evolutionary dynamic and time to the most recent common ancestor (tMRCA) for the HA and NA sequences, Bayesian Markov chain Monte Carlo (MCMC) method was performed using BEAST v2.5.241 package.7 In these analyses, HKY+G (for the HA dataset) and GTR+G (for the NA dataset) substitution model, uncorrelated lognormal relaxed clock model, and Bayesian skyline coalescent tree priors were used.8 Four independent MCMC chains were run for 25 million generations (sampling every 2,500 steps) and were combined using the LogCombiner program v1.54.8 Convergence was assessed based on the effective sampling size (ESS) after a 10% burn-in, using Tracer software v1.5. ESSs of 200 and above were accepted. The maximum clade credibility tree was generated in the TreeAnnotator program, while the initial 10% of trees were discarded as burn-in.7,9The maximum clade credibility tree (MCC) was visualized in the FigTree program v1.2.3.10
Inferences about spatial dynamic and potential viral migration patterns of H1N1 within Iran and between Iran and its neighboring countries were made based on a discrete Bayesian phylogeographic model developed in BEAST software v2.5.241.8 The symmetric substitution model with the Bayesian Stochastic Search Variable Selection (BSSVS) approach was performed in the SPREAD program v1.0.7.11
To determine selection pressure acting on HA and NA lineages, we estimated the ratio of non-synonymous (dN) to synonymous (dS) substitutions per site (ratio dN/dS) for each lineage, using all the sequences included in this study. Positively selected codons were detected using the single likelihood ancestor counting (SLAC) and fixed effects likelihood (FEL) methods with a significance level of 0.1, all procedures are available in the HyPhy package and accessed through the Datamonkey web server.12,13