Phenotypic and genomic data, and plant height - seeds weight scaling
Ready-to-use phenotypic data for plant height and thousands seeds weight (hereafter seeds weight) and high-density genome-wide SNP dataset for approximately 13,000 barley (Hordeum vulgare L.) accessions were obtained from the Federal ex situ Genebank for Agricultural and Horticultural Plant Species (IPK) in Germany. The panel includes both domesticated barley (cultivars and landraces) and its conspecific wild progenitor H. vulgare ssp. spontaneum (K. Koch) Thell. Plant height from the soil surface to the top of the spike, including awns and seeds weight (in the form of thousand seeds weight) were assessed during seed regeneration using plots of at least 3 m2 (Gonzalez et al., 2018). SNP profiles were derived from single plant of the accessions in the IPK barley collection through genotyping-by-sequencing (GBS) method (Milner et al. 2019).
We retained samples with both phenotypic and genotypic data available for further analysis. The retained phenotype and genotype data are subject to further filtering with all samples with <10% missing genotypes and minor allele frequency (MAF) > 0.01. Consequently, we have obtained 133,588 SNPs for 12,828 samples, including wild types, landrace, and cultivars, from 85 countries and regions of all continents with agriculture. The samples also contain different habit (winter-type with vernalisation required for flowering, or spring-type with relaxed vernalisation required for flowering) and growth form (two-rowed, or six-rowed), and contain sufficient variation in life history to capture the general scaling law.
Plant height and seeds weight scaling was first evaluated through bivariate linear model analysis using PAST V3 (Hammer et al., 2001). If the correlation between plant height and seeds weight is determined by shared genomic factors, it would be expected that the two traits are evolutionarily correlated independent of their phylogenetic relationship. We therefore first test the evolutionary correlation of the two traits after controlling phylogenetic relatedness among the samples. To do so, we first used RAxML to construct the phylogenetic tree of the 12,828 samples following a maximum likelihood procedure (Stamatakis, 2014). We then implemented a generalized least squares regression analysis, and used phylogenetic generalized ANOVA to test the correlation of the two traits after controlling their phylogenetic relationship using the software package of Phylocom(Webb et al., 2008).
Heritability and genetic correlation , genome-wide association studies for plant height and seeds weight
We evaluated the heritability of plant height and seeds weight in barley. We employed a genome-based restricted maximum likelihood method (GREML-LDMS) to estimate the narrow-sense SNP-based heritability (h 2SNP) (Yang et al., 2015). To do so, we computed linkage disequilibrium (LD) scores between SNPs with the block size of 100 kb using GCTA (Yang et al., 2011), then used GREML (a function within GCTA) to calculate the proportion of variance in a phenotype explained by the SNPs following an LD score regression ash 2SNP (Yang et al., 2015). We further estimated the genetic correlation between the two traits following the bivariate GREML procedure using GCTA (Yang et al., 2011).
We further identified SNPs that are associated with either plant height or seed weight through GWAS analysis. We first calculated the first five principal eigenvectors from principal components analysis (PCA) using GCTA (Yang et al., 2011) as covariates in the GWAS model in order to account for population genetic structure. GWAS analysis was conducted using program FaST-LMM that calculates and uses kinship as a realised relationship matrix and following a Factored Spectrally Transformed Linear Mixed Model (Listgarten et al., 2012). We used Bonferroni correction to determine significant SNPs.
We finally evaluated linkage disequilibrium (LD) decay using ther 2 parameter between all pairwise SNP comparisons within a genome window of 5 Mb by using PLINK ver 1.9 (Chang et al., 2015) and PopLDdecay (Zhang et al., 2019). We examined the pattern of the distance between immediate neighbouring SNP pairs with one SNPs being significantly associated with plant height, and the another with seeds weight, and evaluated against the global LD decay pattern according to their distance separated in the chromosome.