Introduction
Brassica , which belongs to the Brassicaceae family, is
among the most economically important genus, since it contains a wide
range of staple vegetables and oilseed crops. Over the course of its
evolution, Brassica experienced an additional genome-wide
triplication (WGT) event after it splitted with Arabidopsis from
a common
ancestor(Cheng et al.,
2016; Lysak, Koch, Pecinka, & Schubert, 2005). Thus, species in theBrassica genus not only display great morphological and
phytochemical diversity but also karyotype
diversity(Cheng et
al., 2016; Wang et al., 2019). Among the most agriculturally importantBrassica species, there are three diploid genome types includingBrassica rapa (AA), Brassica nigra (BB) and Brassica
oleracea (CC), and three allopolyploid species which were generated by
the pair combinations of the former three diploid species, includingBrassica napus (AACC), Brassica juncea (AABB) andBrassica carinata (BBCC). These six species and their
evolutionary origination and relationship with each other are well
defined in a ‘triangle of U’
model(Wang et al.,
2019;Yang et al., 2016).
Due to the rapid recent advances in sequencing technology, especially
the next-generation sequencing (NGS), a large number of Brassicaspecies have been sequenced, but most are only on a primitive level of
quality. These sequenced genomes, for example those sequenced with
illumina/Roche 454 technology, including B. rapa var.pekinensis Chiifu
(Wang et al., 2011),B. oleracea 02-12(Liu
et al., 2014), B. oleraceaTO1000DH(Parkin et al.,
2014), B. nigraYZ12151(Yang et al., 2016),B. napus (Bayer et al., 2017; Chalhoub et al., 2014; Sun et al.,
2017), and B. juncea (Wang et al., 2019; Yang et al., 2016) had a
relatively low continuity which may impede the genomic analysis
especially at the complex genomic parts such as pericentromeric and
centromeric regions. Only until recently, the application of long-read
sequencing technologies, including Oxford Nanopore Technology (ONT) and
Pacific Biosciences (PACBIO), to genome assembling has greatly improved
continuity of the assembled contigs.There are at least fourBrassica genomes that were reported to be sequenced with long
read technology with a resulting contig N50 up to megabase size,
including B. oleracea cultivars HDEM, Brassica rapa Z1
(yellow sarson)(Belser et
al., 2018), B. oleracea var.
botrytis(Sun et al., 2019)
and B. napus (Song et al., 2020). These studies demonstrated great
success in the assembly of high continuity genome assemblies (i.e.
N50>5Mb)(Belser
et al., 2018) with long read technology in Brassica genomes.
Since the great morphological and phytochemical diversity in theBrassica species, genome information from a wide range of
representative Brassica species will be helpful and needed to
deeply decipher the genomic variants that may contribute to the great
diversity that not only phenotype but also karyotype various cultivars
of the species.
The Chinese flowering cabbage (Brassica rapa var.parachinensis ), locally known as Caixin, Tsai Tai, Choy Sum, bok
choy, or Tsai Hsin(Tan, Fan, Kuang, Lu, & Reiter, 2019; Xiao et al.,
2019), is an important leafy and bolting stem vegetable widely grown in
Asia, particularly in China, Japan, and
Korea(Kamran et al., 2020).
This vegetable has high nutritional value and is rich in vitamins,
minerals, secondary metabolites and dietary fiber, which confer human
health-promoting effects(Xiao
et al., 2019). Unlike other B. rapa vegetables, Chinese
flowering cabbage can bolt and flower easily without strict
vernalization under low temperature. Therefore, it is very important to
conduct this genome sequencing and assembly to further uncover the
genomic information and molecular mechanisms involved in the formation
of special morphological and phytochemical characteristics of this
cultivar.
In this study, we report a high continuity (N50 = 7.2 Mb) and chromosome
level genome assembly for Chinese flowering cabbage (Brassica
rapa ). It was assembled with an integrated approach using Illumina
sequencing, PacBio and high-throughput chromosome conformation capture
(Hi-C) technology. The assembly resolved a large part of the
pericentromeric regions of this species. In addition, genome comparison
and evolutionary analysis of this genome and other representativeBrassica species were conducted. The results provide novel
insights into the Brassica genome structure evolution.