Haplotype inference and population structure mapping
Haplotypes of genes were inferred following an expectation-maximization algorithm (Bilmes, 1998; Dempster, Laird, & Rubin, 1977). We used an in-house Perl script to perform this haplotype inference, employing short reads to extract SNP linkage information (available from the above GitHub repository). If two adjacent SNPs were not covered by any read pair, we broke the gene into segments. In this case, the midpoint of the two adjacent SNPs would be defined as the breakpoint of two consecutive segments. Because the inference process uses a maximum likelihood method to compare haplotype alternatives, it is prone to yield short segments when a large number of populations is considered. Therefore, we selected eight populations representing different varieties and different regions for inferring haplotypes: two A. m. eucalyptifolia (euCA and euDW), two A. m. australasica (auAK and auBS), and four A. m. marina (maBB, maLS, maTN, and maSY). Finally, genes were split into 454 linked segments and haplotypes were inferred for each segment (Supplementary Table 3). Before constructing haplotype networks, we filtered segments with length less than 100bp or with missing data. For each of the 231 retained segments, we computed a haplotype network using the NETWORK software (Polzin & Daneshmand, 2003).