2.2 DNA preparation and sequencing
In order to estimate genome size and heterozygosity, 50 male adults were collected from the same colony, and high-quality genomic DNA was extracted by Sangon Biotech Ezup column animal genomic DNA purification kit. The extracted DNA was used to construct the paired end 150 (PE150) library which was sequenced by illumina sequencer. 20.62 Gb clean data were obtained by quality control, the total sequencing depth was about 120.75×, the content of GC was about 29.07%, the proportion of Q20 was more than 96.96% and the proportion of Q30 was more than 91.91%.
Another 50 male adults from the same colony were selected and delivered to the Biomarker Biotechnology Co., Ltd to construct long-read sequenced DNA libraries and sequence by the Oxford Nanepore Technologies (ONT). A total of 20.67 Gb raw data with a N50 read length of 24.7 kb were acquired through the third-generation nanopore platform sequencer. 20.32 Gb clean data were obtained after filtering adapter, short fragments and low-quality data.
Furthermore, another 500 male adults from the same colony were selected to construct the Hi-C DNA libraries which was sequenced through illumina sequencing platform; a total of 22.92 GB clean data was obtained with the proportion of Q30 reaching more than 91.46 %. After the library quality evaluation, the proportion of reads containing enzyme digestion sites in Hi-C library reached more than 31.09 % (Supplementary Table 1). The transcriptomes of 50 male adults of C. striatipennisKieffer was sequenced from the same colony and the transcriptomes ofC. riparius Meigen was downloaded from NCBI, both of them were used to assist genome annotation.