Simin Wang

and 12 more

High-throughput sequencing (HTS) provides an efficient and cost-effective way to generate large amounts of sequence data. However, marker-based methods and the resulting datasets come with a range of challenges and disputes, including incomplete reference databases, controversial sequence similarity thresholds for delineating taxa, and downstream compositional data analysis. Here, we use HTS data from a soil nematode biodiversity experiment to address the following questions: (1) how the choice of reference database affects HTS data analysis, (2) whether the same ecological patterns are detected with ASV (100% similarity) versus classical OTU (97% similarity), and (3) how different data normalization methods affect the recovery of beta diversity patterns and identification of differentially abundant taxa. At this time, the SILVA database performed better than PR2, assigning more reads to family level and providing higher phylogenetic resolution. ASV- and OTU-based alpha and beta diversity of nematodes correlated closely, indicating that OTU-based studies represent useful reference points. For downstream data analyses, our results indicate that rarefaction-based methods are more vulnerable to missed findings, while clr-transformation based methods may overestimate tested effects. ANCOM-BC retains all data and accounts for uneven sampling fractions for each sample, suggesting that this is currently the optimal method to analyze compositional data. Overall, our study highlights the importance of comparing and selecting taxonomic reference databases before data analyses, and provides solid evidence for the similarity and comparability between OTU- and ASV-based nematode studies. Further, the results highlight the potential weakness of rarefaction-based and clr-transformation based methods. We recommend future studies use ASV and that both the taxonomic reference databases and normalization strategies are carefully tested and selected before analyzing the data.

Beant Kapoor

and 16 more

Northern red oak (Quercus rubra L.) is an ecologically and economically important forest tree native to the northeastern United States. We present a chromosome-scale, haplotype-resolved genome of Q. rubra, a representative red oak species, generated by the combination of PacBio sequences and chromatin conformation capture (Hi-C) scaffolding. This is the first reference genome from the red oak clade (section Lobatae). The Q. rubra assembly spans 739 Megabases (Mb) with 95.27% of the genome sequences scaffolded into 12 chromosomes and 33,333 protein-coding genes. Comparisons to the genomes of Q. lobata and Q. mongolica reveal high collinearity, with intrachromosomal structural variants present. Orthologous gene family analysis with other oak and rosid tree species revealed that gene families associated with defense response were expanding and contracting simultaneously across the Q. rubra genome. Quercus rubra had the most CC-NBS-LRR and TIR-NBS-LRR resistance genes out of the nine species analyzed. Terpene synthase gene family comparisons further reveal tandem gene duplications in TPS-b subfamily, similar to Q. robur. Single major QTL regions were identified for vegetative bud break and marcescence which contain candidate genes for further research, including a putative ortholog of the circadian clock constituent cryptochrome (CRY2) and a family of eight tandemly duplicated genes for serine protease inhibitors, respectively. Genome-environment associations across natural populations identified candidate abiotic stress tolerance genes and predicted performance in a common garden. This high-quality red oak genome represents an essential resource to the oak genomics community which will further supplement the knowledge of Quercus genomics.