Genotyping and multi-environment QTL modelling
Illumina fragment paired end libraries, representing each of the four
grandparents (A: AP13, B: DAC; C: WBC; D: VS16) were sequenced and
genotyped in the context of the P.virgatum reference genome v5 as
detailed in (Bragg et al., 2020). The genetic map spans 750 recombinant
4-way progeny genotyped at 4700 markers. Details on the genetic map
construction, map polishing and fine-scale reordering can be accessed on
https://datadryad.org/stash/dataset/doi:10.5061/dryad.ghx3ffbjv (Lovell
et al., 2020). For computational efficiency in G x E analysis, the
genetic map was reduced into 738 markers, with an average intermarker
distance of to 2cM.
Narrow-sense heritability (h2 ) for each trait
at each environment and the genetic correlation between traits among
sites and across sites were estimated using the additive kinship matrix
based on marker genotypic information using the ‘sommer’ package
(Covarrubias-Pazaran, 2016) in R (2018). Briefly, we used a multivariate
mixed model (mmer) that takes the kinship matrix to estimate the
variance components for each trait under each environment, and
calculates h2 as the proportion of additive
genetic variance to the total variance.
Details of the mapping scheme and application in the outbred four-way
population are described in Malosetti et al. (2014) and Lowry et al.
(2019). In brief, ‘single trait under multiple environments’ QTL mapping
for each panicle trait in the cross-pollinated (CP) family was
implemented in Genstat (2019). The QTL approach with CP family resulted
in four possible QTL alleles designated A and Bcorresponding to marker alleles of the first pair of grandparents (AP13
x DAC) and QTL alleles C and D corresponding to marker
alleles of the second pair of grandparents (WBC x VS16). A
multienvironment mixed model was fit for each trait as shown in Eq. 1:
\(trait=u+E+\ \sum\text{QTL}+\ \sum\left(\text{QTL\ x\ E}\right)+e\)Eq. (1),
where μ is the population mean; E represents the
environment effect;\(\sum\text{QTL}=\sum{(a^{a1}+a^{a2}+a^{d})}\),
denoting the total effect from the additive effect from the first
grandparent (i.e., the difference between A and B alleles,\(a^{a1})\), the second grandparent (i.e., the difference betweenC and D alleles, \(a^{a2})\), and the dominance effect
(i.e., the intralocus interaction, \(a^{d})\);\(\sum{\left(\text{QTL\ x\ E}\right)\ }\)represents the QTL ×
environment interactions; and e represents the error term that
was modeled by an unstructured variance–covariance matrix. The
unstructured model was used to specify the data structure in the
genome-wide QTL scan of simple interval mapping (SIM) and composite
interval mapping (CIM). A backward selection procedure was used to
retain significant fixed terms (p < 0.05) after three
consecutive runs of CIM to confirm stability of QTL. The QTL with
highest LOD peaks were considered as the most significant QTL, and the
flanking markers associated with 1.5 LOD drop around the most
significant QTL were considered as confidence interval for the QTL
peaks.
The quality of the predictions from the full G x E model (Eq. 1) was
evaluated with an independent dataset. In three of the 10 field sites
(CLMB, KBSM, PKLE), 370 extra genotypes were planted and panicle
morphology data were collected for these genotypes in 2016. The
multi-environment QTL x E model was evaluated by 1) extracting the final
full model from Genstat based on the core set of progeny, 2)
reconstructing the model in R, 3) predicting the panicle traits for
these additional genotypes at these 3 field sites, and 4) comparing the
model predictions with field observations. Percentage of bias (bias%)
and the prediction accuracy (r , or correlation coefficient)
between the model predictions and field observations were used as
statistical measures for model performance.
Our Genstat modeling of QTL x E considered sites as fixed effect. To
further explore the possible environmental drivers of the QTL x E, we
explored a variety of regression models using best subset generalized
linear models (‘bestglm’ package, McLeod et al., 2020) in R. Here, the
QTL effects (i.e., the difference of A versus B or C versus D alleles)
were predicted with the yearly mean temperature (Tmean), the yearly
total rainfall (Rainfall), the day length (DL), and the yearly average
solar radiation (SRAD) as predictors as shown in Equation 2.
effect = µ + Tmean + Rainfall + DL + SRAD +e Eq. (2)
These environmental variables were selected because temperature, water
availability and light have been identified as interacting with panicle
development, as mentioned in the Introduction. Interaction terms between
variables were not included in our models because they were highly
correlated with the individual variable. We used the bestglm function
and performed all-subset linear regression based on Bayesian information
criteria (BIC) to find the best subset of variables to fit the model.
Environmental variables retained from the model were considered as
candidate environmental factors that interacted with specific QTL,
resulting in different effects at different sites.
Additionally, to investigate potential pleiotropic relationships between
panicle traits and other traits, we ran the multi-environment QTL model
for flowering (FL50), tiller count (TC), and biomass (BIO) in
switchgrass and examined if the QTL identified for panicle traits share
the same QTL with these traits. FL50, TC and BIO data were collected in
the same year as panicle data and previously reported in Lowry et al.
(2019). FL50 is the day of the year when 50% of the tillers on a plant
have panicles that have begun flowering. TC and BIO are the total tiller
count and dry biomass at the end of the growing season, respectively.
Traits with overlapping QTL confidence interval were considered as
potentially pleiotropic.