Genome sequencing, size estimation and assembly
The genome size of O. kokonorica was estimated to be ~520 Mb by K-mer analysis based on 35.27 Gb of cleaned Illumina data (Figure S1; Table S1). A combination of Illumina, Nanopore and Hi-C technologies were adopted for sequencing to accurately assemble its genome. Based on 61.87 Gb of Nanopore long reads corresponding to 110× coverage of the estimated ~520 Mb genome (Table S1), we polished the raw assembled genome using NextPolish and performed deredundancy with purge_haplotigs, resulting in a final genome assembly with length of 556 Mb and contig N50 of 9.08 Mb. The Benchmarking Universal Single-Copy Orthologs (BUSCO) evaluation score was 97.6%, indicating a very complete and high-quality genome assembly (Table 1; Table S2). Based on ~137 Gb of Hi-C data, we further connected 127 contigs onto 20 pseudochromosomes. In total, 99.80% (554.86 Mb) of the assembly was anchored and oriented on 20 pseudochromosomes (Figure S2; Table S3). 98.93% of Illumina short reads could be properly mapped to the final genome assembly (Table S4). These assessments indicate that the genome of O. kokonorica was assembled with high quality, completeness and accuracy.