2.2 Reference genome sequencing and assembly
We used a combination of long-fragment sequencing, short-insert library sequencing for error correction and gapfilling, and chromatin conformation capture (Hi-C) to generate chromosome-level semelparous mammal reference genomes. High-molecular weight (HMW) DNA extracted from the testis of the A. flavipes individual ‘AdamAnt’ was used to generate long-read (PacBio) sequencing data by Annoroad Gene Technology (Beijing, China). Paired-end (2 × 100 bp) BGI-SEQ500 data were generated from cerebrum, liver, heart, and lung tissue from the same individual by BGI-Qingdao. A total of 323.85 Gb (~100×) A. flavipes PacBio reads were assembled using Canu v1.7 (Koren et al., 2017) with the error correction module. The corrected subreads were used for initial draft assembly using Wtdbg2 v1.2.8 (Ruan & Li, 2020). To reduce base errors, the assembly was ‘polished’ using Pilon v1.23 (Walker et al., 2014) and 151.43 Gb (50×) 100 bp paired-end BGISEQ-500 reads [mapped to the initial PacBio assembly using Minimap2 v2.10 (Li, 2018) and SAMtools v1.9 (Li et al., 2009)].
Genome sizes were estimated by k -mer frequency analysis (Liu et al., 2013). Briefly, 100 bp paired-end WGS reads were used as input into the GCE (Genomic Charactor Estimator) v1.0.0 (Marcais & Kingsford, 2011) to obtain the k -mer frequency and the genome size was estimated using the equation ‘Genome size = k -mer number /k -mer depth’, where the ‘k -mer number’ is the total number of k -mers and ‘k -mer depth’ denotes the peak frequency that occurred more than any other frequencies. Genome length was estimated on the basis of total scaffold length of the assembly. Using the frequency distribution of 17-mers of short paired-end reads (Figure S1 ), the A. flavipes genome was estimated to be 3.2 Gb.
Assembly quality was assessed using BUSCO (Benchmarking Universal Single-Copy Orthologs) v5.0.0_cv1 (Seppey, Manni, & Zdobnov, 2019), employing the gene predictor AUGUSTUS v3.2.1 (Stanke & Waack, 2003) and the 9,226-gene BUSCO mammalian lineage data set (mammalia_odb10). Although, gene centric, the BUSCO Score is a good predictor of genome completeness (Seppey et al., 2019).