Reference genome development
To obtain the reference genome of C. subpubescens , the individual with the highest homozygosity (based on previous studies using SSR markers, Setsuko et al. 2023) was selected from 51 individuals cultivated in a greenhouse at the Forestry and Forest Products Research Institute. DNA was extracted from fresh leaves using the Genomic-tip (Qiagen, Germany). Library construction, using the SMRTbell Template Prep Kit (PacBio, USA), was performed according to the manufacturer’s instructions. The DNA library was further fractionated using BluePippin (Sage Science, USA) to eliminate fragments < 15 kb in size and sequenced using four single-molecule real-time cells on the Sequel system (PacBio, USA). DNA extraction, library preparation, and sequencing were conducted by the Kazusa DNA Research Institute (Chiba, Japan).
De novo genome assembly for C. subpubescens involved preprocessing to split chimera sequences using yacrd (Marijon et al., 2020). The assembly, conducted using wtdbg2 v. 2.5 (Ruan & Li, 2020), resulted in a genome size of approximately 450 Mb (Masuda et al., unpublished). The original dataset was approximately 81 × the size of the C. subpubescens genome. The dataset comprised a total of 482,624,924 bases and 6,011 reads, with read lengths of 1,129–5,172,107 bp (mean: 80,290 bp). The N50 sequence length was 623,636 bp. The quality of the assembly was assessed using the web tool gVolante (Nishimura et al., 2017). Using BUSCO (Simão et al., 2015) implemented in gVolante, approximately 86.1% of the complete core plant genes (1,440 in total) were detected in the assembly.