Method
We screened the NCBI Influenza Virus Resource and downloaded all HA and
NA nucleotide sequences of influenza H1N1 available up to December 25,
2020 from Iran and its neighboring countries (i.e., Pakistan,
Afghanistan, Armenia, Turkmenistan, Azerbaijan, Turkey and Iraq). We
captured sampling date and location of sequences by extracting
information deposited in the NCBI Influenza Virus Resource. In our final
genome sets, a total of 398 HA and 339 NA sequences remained after the
exclusion of identical sequences and sequences with unknown sampling
time and location. By using the ClustalW algorithm in MEGA v7.0,
multiple sequence alignment was performed and manually inspected. The
dataset was trimmed for phylogenetic
analysis.6 The initial
maximum likelihood (ML) phylogenetic analysis was conducted using
Hasegawa- Kishino-Yano (HKY) and General Time Reversible (GTR)
nucleotide substitution models for HA and NA, respectively. The
best-fitting nucleotide substitution models were identified using an
online execution on the ATGC bioinformatics platform
(http://www.atgc-montpelier. fr/). Analysis was performed with
1000 bootstraps in MEGA
v7.0.6
To infer the evolutionary dynamic and time to the most recent common
ancestor (tMRCA) for the HA and NA sequences, Bayesian Markov chain
Monte Carlo (MCMC) method was performed using BEAST v2.5.241
package.7 In these
analyses, HKY+G (for the HA dataset) and GTR+G (for the NA dataset)
substitution model, uncorrelated lognormal relaxed clock model, and
Bayesian skyline coalescent tree priors were
used.8 Four independent
MCMC chains were run for 25 million generations (sampling every 2,500
steps) and were combined using the LogCombiner program
v1.54.8 Convergence was
assessed based on the effective sampling size (ESS) after a 10%
burn-in, using Tracer software v1.5. ESSs of 200 and above were
accepted. The maximum clade credibility tree was generated in the
TreeAnnotator program, while the initial 10% of trees were discarded as
burn-in.7,9The maximum clade credibility tree (MCC) was visualized in the FigTree
program v1.2.3.10
Inferences about spatial dynamic and potential viral migration patterns
of H1N1 within Iran and between Iran and its neighboring countries were
made based on a discrete Bayesian phylogeographic model developed in
BEAST software
v2.5.241.8 The symmetric
substitution model with the Bayesian Stochastic Search Variable
Selection (BSSVS) approach was performed in the SPREAD program
v1.0.7.11
To determine selection pressure acting on HA and NA lineages, we
estimated the ratio of non-synonymous (dN) to synonymous (dS)
substitutions per site (ratio dN/dS) for each lineage, using all the
sequences included in this study. Positively selected codons were
detected using the single likelihood ancestor counting (SLAC) and fixed
effects likelihood (FEL) methods with a significance level of 0.1, all
procedures are available in the HyPhy package and accessed through the
Datamonkey web
server.12,13