Reference genome development
To obtain the reference genome of C. subpubescens , the individual
with the highest homozygosity (based on previous studies using SSR
markers, Setsuko et al. 2023) was selected from 51 individuals
cultivated in a greenhouse at the Forestry and Forest Products Research
Institute. DNA was extracted from fresh leaves using the Genomic-tip
(Qiagen, Germany). Library construction, using the SMRTbell Template
Prep Kit (PacBio, USA), was performed according to the manufacturer’s
instructions. The DNA library was further fractionated using BluePippin
(Sage Science, USA) to eliminate fragments < 15 kb in size and
sequenced using four single-molecule real-time cells on the Sequel
system (PacBio, USA). DNA extraction, library preparation, and
sequencing were conducted by the Kazusa DNA Research Institute (Chiba,
Japan).
De novo genome assembly for C. subpubescens involved
preprocessing to split chimera sequences using yacrd (Marijon et al.,
2020). The assembly, conducted using wtdbg2 v. 2.5 (Ruan & Li, 2020),
resulted in a genome size of approximately 450 Mb (Masuda et al.,
unpublished). The original dataset was approximately 81 × the size of
the C. subpubescens genome. The dataset comprised a total of
482,624,924 bases and 6,011 reads, with read lengths of 1,129–5,172,107
bp (mean: 80,290 bp). The N50 sequence length was 623,636 bp. The
quality of the assembly was assessed using the web tool gVolante
(Nishimura et al., 2017). Using BUSCO (Simão et al., 2015) implemented
in gVolante, approximately 86.1% of the complete core plant genes
(1,440 in total) were detected in the assembly.