3.4 Nakazawaea atacamensis whole genome sequencing
Based on the genome sequencing and assembly of the N. atacamensisATA-11A-BT isolate, we assessed the genome complexity and quality of the novel species. Illumina sequencing yielded approximately 7.2 x 105 filtered reads, providing a sequence coverage depth of 8.8X. Various genome assemblers were compared, and the best assembly was obtained using SPAdes/Shovill. The resulting genomic draft of N. atacamensis had a 12.4 Mbp length, with an estimated GC content of 36.7%. The N. atacamensisassembly consisted of 115 contigs, with 42 contigs exceeding 501 bp, accounting for 99% of the assembled sequences. The largest contig had a length of 2,070.2 Kbp, and the N50 value of the assembly was 729,094 bp, indicating the contiguity of the assembly. The details of the sequencing results and assembled contigs can be found in Table S3 .
Gene prediction and component analysis of the N. atacamensisgenome using the GeneMark tool resulted in the identification of 5,394 predicted genes. Among these genes, 5,116 protein-coding genes (95%) were annotated with InterProScan (Table S4 ). To facilitate the reconstruction of the molecular network from the predicted proteins, we employed KofamKOALA and assigned KEGG Orthologs (KOs). We identified 2,782 genes involved in 385 pathways (Table S5 ). Most of the predicted genes are associated with metabolic pathways, biosynthesis of secondary metabolites, microbial metabolism in diverse environments, and biosynthesis of cofactors (Figure 3A ). Given that N. atacamensis is a fermenting yeast, we specifically focused on carbon metabolism. Our analysis revealed the presence of 68 genes encoding enzymes involved in various carbon source metabolism pathways (Figure 3B , Table S5 ). These pathways include glycolysis/gluconeogenesis, pyruvate metabolism, the citrate cycle (TCA), and the pentose phosphate pathway, all of which are critical for sugar fermentation through the central carbon metabolism. Additionally, other pathways, such as glycogen biosynthesis and degradation, nucleotide sugar biosynthesis, and UDP-N-acetyl-D-glucosamine biosynthesis, may also be present in N. atacamensis .
The genus Nakazawaea currently contains 15 available genomes. Notably, the genome size of N. atacamensis is comparable to that of other Nakazawaea species, such as N. ishiwadae GDMCC 60786 (Ma et al. , 2021). To validate species discrimination, we employed the Average Nucleotide Identity (ANI) analysis acrossNakazawaea genomes. Our examination revealed an average ANI value of 72.1% between N. atacamensis and the remaining genomes (Table S6 ). Consequently, this finding supports the classification of N. atacamensis as a novel species, consistent with the established yeast species delineation criteria (Lachanceet al. , 2020). ANI serves as a robust parameter for demarcating species boundaries in yeasts using genome sequence data. Specifically, ANI values of 95% and above are conventionally indicative of distinct yeast species (Lachance et al. , 2020). The phylogenetic tree utilising whole-genome data (Opulente et al. , 2023) clustersN. atacamensis together with N. laoshanensis , N. siamensis and N. peltata , in agreement with the ITS sequencing results, and representing the N. atacamensis closest relatives (Figure 3C ).