Margarida Silva

and 6 more

The yeast Torulaspora delbrueckii is gaining importance for biotechnology due to its ability to increase wine sensorial complexity and for enhancing pre-frozen bread dough leavening. However, little is known about its population structure, variation in gene content, or possible domestication routes. Here, we address these issues and update the delimitation of T. delbrueckii along five major clades. Among the three European clades, a basal lineage is associated with the wild arboreal niche, while the two other lineages are linked with anthropic environments, one to wine fermentations and the other to diverse sources including dairy products and bread dough (Mix- Anthropic clade). Using 62 genomes we identified 5629 genes in the pangenome of T. delbrueckii and 270 genes in the cloud genome. A pangenome tree analysis showed that wine strains have a genome composition more similar to European wild arboreal strains than to those of the Mix Anthropic clade, in contradiction with the phylogenetic analysis. An association of gene content and ecology gave further support to the hypothesis that the Mix - Anthropic clade has the most specialized genome content and indicated that some of the exclusive genes were implicated in galactose and maltose utilization. More detailed analyses traced the acquisition of a cluster of GAL genes in strains associated with dairy products and the expansion and functional diversification of MAL genes in strains isolated from bread dough. Contrary to S. cerevisiae, domestication in T. delbrueckii is not primed by alcoholic fermentation and appears to be a recent event.

João Tadeu Fontes

and 4 more

Biodiversity studies greatly benefit from molecular tools, such as DNA metabarcoding, which provides an effective identification tool in biomonitoring and conservation programmes. The accuracy of species-level assignment, and consequent taxonomic coverage, relies on comprehensive DNA barcode reference libraries. The role of these libraries is to support species identification, but accidental errors in the generation of the barcodes may compromise their accuracy. Here we present an R-based application, BAGS (Barcode, Audit & Grade System), that performs automated auditing and annotation of cytochrome c oxidase subunit I (COI) sequences libraries, for a given taxonomic group of animals, available in the Barcode of Life Data System (BOLD). This is followed by implementing a qualitative ranking system that assigns one of five grades (A to E) to each species in the reference library, according to the attributes of the data and congruency of species names with sequences clustered in Barcode Index Numbers (BINs). Our ultimate goal is to allow researchers to obtain the most useful and reliable data, highlighting and segregating records according to their congruency. Different tests were performed to perceive its usefulness and limitations. BAGS fulfils a significant gap in the current landscape of DNA barcoding research tools by quickly screening reference libraries to gauge the congruence status of data and facilitate the triage of ambiguous data for posterior review. Thereby, BAGS have the potential to become a valuable addition in forthcoming DNA metabarcoding studies, in the long term contributing to globally improve the quality and reliability of the public reference libraries.