Genotyping and multi-environment QTL modelling
Illumina fragment paired end libraries, representing each of the four grandparents (A: AP13, B: DAC; C: WBC; D: VS16) were sequenced and genotyped in the context of the P.virgatum reference genome v5 as detailed in (Bragg et al., 2020). The genetic map spans 750 recombinant 4-way progeny genotyped at 4700 markers. Details on the genetic map construction, map polishing and fine-scale reordering can be accessed on https://datadryad.org/stash/dataset/doi:10.5061/dryad.ghx3ffbjv (Lovell et al., 2020). For computational efficiency in G x E analysis, the genetic map was reduced into 738 markers, with an average intermarker distance of to 2cM.
Narrow-sense heritability (h2 ) for each trait at each environment and the genetic correlation between traits among sites and across sites were estimated using the additive kinship matrix based on marker genotypic information using the ‘sommer’ package (Covarrubias-Pazaran, 2016) in R (2018). Briefly, we used a multivariate mixed model (mmer) that takes the kinship matrix to estimate the variance components for each trait under each environment, and calculates h2 as the proportion of additive genetic variance to the total variance.
Details of the mapping scheme and application in the outbred four-way population are described in Malosetti et al. (2014) and Lowry et al. (2019). In brief, ‘single trait under multiple environments’ QTL mapping for each panicle trait in the cross-pollinated (CP) family was implemented in Genstat (2019). The QTL approach with CP family resulted in four possible QTL alleles designated A and Bcorresponding to marker alleles of the first pair of grandparents (AP13 x DAC) and QTL alleles C and D corresponding to marker alleles of the second pair of grandparents (WBC x VS16). A multienvironment mixed model was fit for each trait as shown in Eq. 1:
\(trait=u+E+\ \sum\text{QTL}+\ \sum\left(\text{QTL\ x\ E}\right)+e\)Eq. (1),
where μ is the population mean; E represents the environment effect;\(\sum\text{QTL}=\sum{(a^{a1}+a^{a2}+a^{d})}\), denoting the total effect from the additive effect from the first grandparent (i.e., the difference between A and B alleles,\(a^{a1})\), the second grandparent (i.e., the difference betweenC and D alleles, \(a^{a2})\), and the dominance effect (i.e., the intralocus interaction, \(a^{d})\);\(\sum{\left(\text{QTL\ x\ E}\right)\ }\)represents the QTL × environment interactions; and e represents the error term that was modeled by an unstructured variance–covariance matrix. The unstructured model was used to specify the data structure in the genome-wide QTL scan of simple interval mapping (SIM) and composite interval mapping (CIM). A backward selection procedure was used to retain significant fixed terms (p < 0.05) after three consecutive runs of CIM to confirm stability of QTL. The QTL with highest LOD peaks were considered as the most significant QTL, and the flanking markers associated with 1.5 LOD drop around the most significant QTL were considered as confidence interval for the QTL peaks.
The quality of the predictions from the full G x E model (Eq. 1) was evaluated with an independent dataset. In three of the 10 field sites (CLMB, KBSM, PKLE), 370 extra genotypes were planted and panicle morphology data were collected for these genotypes in 2016. The multi-environment QTL x E model was evaluated by 1) extracting the final full model from Genstat based on the core set of progeny, 2) reconstructing the model in R, 3) predicting the panicle traits for these additional genotypes at these 3 field sites, and 4) comparing the model predictions with field observations. Percentage of bias (bias%) and the prediction accuracy (r , or correlation coefficient) between the model predictions and field observations were used as statistical measures for model performance.
Our Genstat modeling of QTL x E considered sites as fixed effect. To further explore the possible environmental drivers of the QTL x E, we explored a variety of regression models using best subset generalized linear models (‘bestglm’ package, McLeod et al., 2020) in R. Here, the QTL effects (i.e., the difference of A versus B or C versus D alleles) were predicted with the yearly mean temperature (Tmean), the yearly total rainfall (Rainfall), the day length (DL), and the yearly average solar radiation (SRAD) as predictors as shown in Equation 2.
effect = µ + Tmean + Rainfall + DL + SRAD +e Eq. (2)
These environmental variables were selected because temperature, water availability and light have been identified as interacting with panicle development, as mentioned in the Introduction. Interaction terms between variables were not included in our models because they were highly correlated with the individual variable. We used the bestglm function and performed all-subset linear regression based on Bayesian information criteria (BIC) to find the best subset of variables to fit the model. Environmental variables retained from the model were considered as candidate environmental factors that interacted with specific QTL, resulting in different effects at different sites.
Additionally, to investigate potential pleiotropic relationships between panicle traits and other traits, we ran the multi-environment QTL model for flowering (FL50), tiller count (TC), and biomass (BIO) in switchgrass and examined if the QTL identified for panicle traits share the same QTL with these traits. FL50, TC and BIO data were collected in the same year as panicle data and previously reported in Lowry et al. (2019). FL50 is the day of the year when 50% of the tillers on a plant have panicles that have begun flowering. TC and BIO are the total tiller count and dry biomass at the end of the growing season, respectively. Traits with overlapping QTL confidence interval were considered as potentially pleiotropic.