Structural variants in Brassica genomes
Structural variation (SV) is generally defined as genomic alterations
that are 50bp or larger in size, typically including insertions (INSs),
deletions (DELs), duplications (DUPs), inversions (INVs) and
translocations (TRAs). SVs greatly impact the genes encoded in the
genome and are responsible for diverse agronomically important
phenotypes/traits. Compared to single nucleotide polymorphism (SNP) and
short insertions and deletions (InDels), SVs are less commonly explored
due to the difficulty in fully identifying them with short reads.De novo genome assemblies, especially with high contiguity, can
facilitate in-depth genome-wide identification of all forms of
structural variations. To the best of our knowledge, no work so far has
been conducted to identify SVs based on high-contiguous genome
assemblies in Brassica genomes. To close this knowledge gap and
have a first glimpse of SVs differing within Brassica rapagenomes, we identified SVs using the genomes of B. rapaZ1(Belser et al., 2018) andB. rapa var. parachinensis (this study), each with genome
assembly contig N50, 5.51 Mb and 7.26 Mb, respectively. As shown in Fig.
5A, these two genomes are different only in a single translocation and
do not exist in large chromosomal rearrangements. Using the whole genome
alignment approach, we identified a total of 27,190 insertions, 26,002
deletions, 1,374 duplications in parachinensis assembly, 1,368
duplications in Z1 assembly, and 46 medium-sized inversions with
sizes ranging from 5.2Kb to 1,431.6 Kb, and 8,565 complex SVs with
imprecise breakpoints between Z1 and parachinensis (Fig. 7A). Of
the insertion events, 845 and 847 are found to be newly occurred LTR
insertions specifically in parachinensis and Z1 assembly,
respectively, which are consistent with their relatively recent
estimated insertion times (Fig. 7B). A large proportion of insertions
and deletions detected was found to overlap with the gene regions based
on the gene annotation. In Fig. 7C, two cases of local tandem
duplication are shown to overlap with gene fragments or full genes.
Additionally, comparative genomic analysis can also provide insights
into the mutational mechanisms of structural variations. Of the 46
inversions identified, we found that repeat sequences, especially
inverted repeat sequence features prevail at the flanking regions,
highlighting the causal role of sequence features on small-size
inversion formation (Fig. 7D). Taken together, our analysis of genomic
structural variations based on these highly contiguous genome assemblies
provide the first glimpse of SVs in the Brassic a genomes and
their functional significance on gene structure and thus the potential
effect on phenotype.