The soybean cyst nematode (Heterodera glycines) is a sedentary plant parasite that exceeds a billion dollars in yield losses annually. It has spread across the soybean-producing world, emerging as the primary pathogen of soybeans. This problem is exacerbated by H. glycines populations overcoming the limited sources of natural resistance in soybean and by the lack of effective and safe alternative treatments. Although there are genetic determinants that render soybean plants resistant to certain nematode genotypes, resistant soybean cultivars are increasingly ineffective because their multi-year usage has selected for virulent H. glycines populations. Successful H. glycines infection relies on the comprehensive re-engineering of soybean root cells into a syncytium, as well as the long-term suppression of host defenses to ensure syncytial viability. At the forefront of these complex molecular interactions are effectors, the proteins secreted by H. glycines into host root tissues. The mechanisms that control genomic effector acquisition, diversification, and selection are important insights needed for the development of essential novel control strategies. As a foundation to obtain this understanding, we developed a nine scaffold, 158Mb pseudomolecule assembly of the H. glycines genome using PacBio, Chicago, and Hi-C sequencing. An annotation of 22,465 genes was predicted using a Mikado pipeline informed by published short- and long-read expression data. Here we present results from our assembly and annotation of the H. glycines genome.
Identifying local adaptation in bottlenecked species is essential for conservation management. Selection detection methods have an important role in species management plans, assessments of adaptive capacity, and looking for responses to climate change. Yet, the allele frequency changes exploited in selection detection methods are similar to those caused by the strong neutral genetic drift expected during a bottleneck. Consequently, it is often unclear what accuracy selection detection methods have across bottlenecked populations. In this study, simulations were used to explore if signals of selection could be confidently distinguished from genetic drift across 23 bottlenecked and reintroduced populations of Alpine ibex (Capra ibex). The meticulously recorded demographic history of the Alpine ibex was used to generate comprehensive simulated SNP data. The simulated SNPs were then used to benchmark the confidence we could place in outliers identified in empirical Alpine ibex SNP data. Within the simulated dataset, the false positive rates were high for all selection detection methods but fell substantially when two or more methods were combined. True positive rates were consistently low and became negligible with increased stringency. Despite finding many outlier loci in the empirical Alpine ibex SNPs, none could be distinguished from genetic drift-driven false positives. Unfortunately, the low true positive rate also prevents the exclusion of recent local adaptation within the Alpine ibex. The baselines and stringent approach outlined here should be applied to other bottlenecked species to ensure the risk of false positive, or negative, signals of selection are accounted for in conservation management plans.
DNA metabarcoding is an important tool for molecular ecology. However, its effectiveness hinges on the quality of reference sequence databases and classification parameters employed. Here we evaluate the performance of MiFish 12S taxonomic assignments using a case study of California Current Large Marine Ecosystem fishes to determine best practices for metabarcoding. Specifically, we use a taxonomy cross-validation by identity framework to compare classification performance between a global database comprised of all available sequences and a curated database that only includes sequences of fishes from the California Current Large Marine Ecosystem. We demonstrate that the curated, regional database provides higher assignment accuracy than the comprehensive global database. We also document a tradeoff between accuracy and misclassification across a range of taxonomic cutoff scores, highlighting the importance of parameter selection for taxonomic classification. Furthermore, we compared assignment accuracy with and without the inclusion of additionally generated reference sequences. To this end, we sequenced tissue from 605 species using the MiFish 12S primers, adding 253 species to GenBank’s existing 550 California Current Large Marine Ecosystem fish sequences. We then compared species and reads identified from seawater environmental DNA samples using global databases with and without our generated references, and the regional database. The addition of new references allowed for the identification of 16 native taxa and 17.0% of total reads from eDNA samples, including species with vast ecological and economic value. Together these results demonstrate the importance of comprehensive and curated reference databases for effective metabarcoding and the need for locus-specific validation efforts.
Current knowledge on environmental distribution and taxon richness of free-living bacteria is mainly based on cultivation-independent investigations employing 16S rRNA gene sequencing methods. Yet, 16S rRNA genes are evolutionarily rather conserved, resulting in limited taxonomic and ecological resolutions provided by this marker. We used a faster evolving protein-encoding marker to reveal ecological patterns hidden within a single OTU defined by >99% 16S rRNA sequence similarity. The studied taxon, subcluster PnecC of the genus Polynucleobacter, represents a ubiquitous group of planktonic freshwater bacteria with cosmopolitan distribution, which is very frequently detected by diversity surveys of freshwater systems. Based on genome taxonomy and a large set of genome sequences, a sequence similarity threshold for delineation of species-like taxa could be established. In total, 600 species-like taxa were detected in 99 freshwater habitats scattered across three regions representing a latitudinal range of 3400 km (42°N to 71°N) and a pH gradient of 4.2 to 8.6. Besides the unexpectedly high richness, the increased taxonomic resolution revealed structuring of Polynucleobacter communities by a couple of macroecological trends, which was previously only demonstrated for phylogenetically much broader groups of bacteria. A unexpected pattern was the almost complete compositional separation of Polynucleobacter communities of Ca2+-rich and Ca2+-poor habitats, which strongly resembled the vicariance of plant species on silicate and limestone soils. The presented new cultivation-independent approach opened a window to an incredible, previously unseen diversity, and enables investigations aiming on deeper understanding of how environmental conditions shape bacterial communities and drive evolution of free-living bacteria.
Scale insects are hemimetabolous, showing “incomplete” metamorphosis and no true pupal stage. Ericerus pela, commonly known as the white wax scale insect (hereafter, WWS), is a wax-producing insect found in Asia and Europe. WWS displays dramatic sexual dimorphism, with notably different metamorphic fates in males and females. Males develop into winged adults, while females are neotenic and maintain a nymph-like appearance, which are flightless and remain stationary. Here we report the de novo assembly of the WWS genome with its size of 638.30 Mb (69.68Mb for scaffold N50) by PacBio sequencing and Hi-C. From these data, we constructed a robust phylogenetic analysis of 24,923 gene families from 16 representative insect genomes, which indicates that holometabola evolved from incomplete metamorphosis insects in the Late Carboniferous, about 50 million years earlier than previously thought. To study the distinct development of males and females, we analyzed the methylome landscape in either sex. Surprisingly, WWS displayed high levels of methylation (4.42% for males) when compared to other insects. We observed differential methylation patterns for genes involved in steroid and sesquiterpenoids production as well as related fatty acid metabolism pathways. We show here that both males and females exhibit distinct titer profiles for ecdysone, the principal insect steroid hormone, and juvenile hormone (a sesquiterpenoid), suggesting that these hormones are the primary drivers of sexually dimorphic features. Our results provide a comprehensive genomic and epigenomic resource of scale insects that provide new insights into the evolution of metamorphosis and sexual dimorphism in insects.
Managing endangered species in fragmented landscapes requires estimating dispersal rates between populations over contemporary timescales. Here we develop a new method for quantifying recent dispersal using genetic pedigree data for close and distant kin. Specifically, we describe an approach that infers missing shared ancestors between pairs of kin in habitat patches across a fragmented landscape. We then apply a stepping-stone model to assign unsampled individuals in the pedigree to probable locations based on minimizing the number of movements required to produce the observed locations in sampled kin pairs. Finally, we use all pairs of reconstructed parent-offspring sets to estimate dispersal rates between habitat patches under a Bayesian model. Our approach measures connectivity over the timescale represented by the small number of generations contained within the pedigree and so is appropriate for estimating the impacts of recent habitat changes due to human activity. We used our method to estimate recent movement between newly discovered populations of threatened Eastern Massasauga Rattlesnakes (Sistrurus catenatus) using data from 2996 RAD-based genetic loci. Our pedigree analyses found no evidence for contemporary connectivity between five genetic groups, but, as validation of our approach, showed high dispersal rates between sample sites within a single genetic cluster. We conclude that these five genetic clusters of Eastern Massasauga Rattlesnakes have small numbers of resident snakes and are demographically isolated conservation units. More broadly, our methodology can be widely applied to determine contemporary connectivity rates, independent of bias from shared genetic similarity due to ancestry that impacts other approaches.
The hyper-diverse order Coleoptera comprises a staggering ~25% of known species on Earth. Despite recent breakthroughs in next generation sequencing, there remains a limited representation of beetle diversity in assembled genomes. Most notably, the ground beetle family Carabidae, comprising more than 40,000 described species, has not been studied in a comparative genomics framework using whole genome data. Here we generate a high-quality genome assembly for Nebria riversi, to examine sources of novelty in the genome evolution of beetles, as well as genetic changes associated with specialization to high elevation alpine habitats. In particular, this genome resource provides a foundation for expanding comparative molecular research into mechanisms of insect cold adaptation. Comparison to other beetles shows a strong signature of genome compaction, with N. riversi possessing a relatively small genome (~147 Mb) compared to other beetles, with associated reductions in repeat element content and intron length. Small genome size is not, however, associated with fewer protein-coding genes, and an analysis of gene family diversity shows significant expansions of genes associated with cellular membranes and membrane transport, as well as protein phosphorylation and muscle filament structure. Finally, our genomic analyses show that these high elevation beetles have endosymbiotic Spiroplasma, with several metabolic pathways (e.g. propanoate biosynthesis) that might complement N. riversi, although its role as a beneficial symbiont or as a reproductive parasite remains equivocal.
We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.
The bean bug (Riptortus pedestris) causes great economic losses of soybeans by piercing and sucking pods and seeds. Although R. pedestris has become the focus of numerous studies associated with insect–microbe interactions, plant–insect interactions, and pesticide resistance, a lack of genomic resources has limited deeper insights. In this study, we report the first R. pedestris genome at the chromosomal level using PacBio, Illumina, and Hi-C technologies. The assembled genome was 1.193 Gb in size with a contig N50 of 13.97 Mb. More than 95.7% of the total genome bases were successfully anchored to 6 unique chromosomes, with the scaffold N50 reaching 181.34 Mb. Genome resequencing of male and female individuals and chromosomic staining demonstrated that the sex chromosome system of R. pedestris is XO, and the shortest chromosome is the X chromosome. In total, 21,562 protein-coding genes were predicted, 21,320 of which were validated as being expressed in different tissues or different developmental stages. Evolutionary analysis demonstrated that R. pedestris and Oncopeltus fasciatus formed a sister group and split ∼35 million years ago. Additionally, a 5.04 Mb complete genome of symbiotic Serratia marcescens Rip1 was assembled, and the virulence factors that account for successful colonization in the host midgut were identified. The high-quality R. pedestris genome provides a valuable resource for further research, as well as for the pest management of bug pests.
Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artifact of the sequencing platform and as a result such data are compositional. To avoid library size dependency, one common way of analyzing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centered log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this paper we show on real data and by simulation that, applied to data that combines these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.
Sea Lettuce (Ulva spp.; Ulvophyceae, Ulvales, Ulvaceae) is an important ecological and economical entity, with a worldwide distribution and is a well-known source of near-shore blooms blighting many coastlines. Species of Ulva are frequently misidentified in public repositories, including herbaria and gene banks, making species identification based on traditional barcoding hazardous. We investigated the species distribution of 295 individual distromatic foliose strains from the North East Atlantic by traditional barcoding or next generation sequencing. We found seven distinct species, and compared our results with all worldwide Ulva spp sequences present in the NCBI database for the three barcodes rbcL, tufA and the ITS1. Our results demonstrate a large degree of species misidentification in the NCBI database. We estimate that 21% of the entries pertaining to foliose species are misannotated. In the extreme case of U. lactuca, 65% of the entries are erroneously labelled specimens of another Ulva species, typically U. fenestrata. In addition, 30% of U. rigida entries are misannotated, U. rigida being relatively rare and often misannotated U. laetevirens. Furthermore, U. armoricana and U. scandinavica present as being synonymous to U. laetevirens. An analysis of the global distribution of registered samples from foliose species also indicates possible geographical isolation for some species, and the absence of U. lactuca from Northern Europe. Altogether, exhaustive taxonomic clarification by aggregation of a library of barcode sequences highlights misannotations, and delivers an improved representation of Ulva species diversity and distribution. This approach could be easily adapted to other taxa.
Interactions of organisms with their environment are complex and environmental regulation at different levels of biological organization is often non-linear. Therefore, the genotype to phenotype continuum requires study at multiple levels of organization. While studies of transcriptome regulation are now common for many species, quantitative studies of environmental effects on proteomes are needed. Here we report the generation of a data-independent acquisition (DIA) assay library that enables simultaneous targeted proteomics of thousands of Oreochromis niloticus kidney proteins using a label- and gel-free workflow that is well suited for ecologically relevant field samples. We demonstrate the usefulness of this DIA assay library by discerning environmental effects on the kidney proteome of O. niloticus. Moreover, we demonstrate that the DIA assay library approach generates data that are complimentary rather than redundant to transcriptomics data. Transcript and protein abundance differences in kidneys of tilapia acclimated to freshwater and brackish water (25 g/kg) were correlated for 2114 unique genes. A high degree of non-linearity in salinity-dependent regulation of transcriptomes and proteomes was revealed suggesting that the regulation of O. niloticus renal function by environmental salinity relies heavily on post-transcriptional mechanisms. The application of functional enrichment analyses using STRING and KEGG to DIA assay datasets is demonstrated by identifying myo-inositol metabolism, antioxidant and xenobiotic functions, and signaling mechanisms as key elements controlled by salinity in tilapia kidneys. The DIA assay library resource presented here can be adopted for other tissues and other organisms to study proteome dynamics during changing ecological contexts.
The diploid Poropuntius huangchuchieni in the cyprinid family, which is widely distributed in the Mekong and Red River basins, is one of the most closely related diploid progenitor-like species of allotetraploid common carp, which was generated by merging of two diploid genomes during evolution. Therefore, the P. huangchuchieni genome is essential for polyploidy evolution studies in Cyprinidae. Here, we report a high-quality chromosome-level genome assembly of P. huangchuchieni by integrating Oxford Nanopore and Hi-C technology. The assembled genome size was 1021.38 Mb, 895.66 Mb of which was anchored onto 25 chromosomes with a N50 of 32.93 Mb. The genome contained 486.28 Mb repetitive elements and 24,099 protein-coding genes. Approximately 95.9% of the complete BUSCOs were detected, suggesting a high completeness of the genome. Evolutionary analysis revealed that P. huangchuchieni diverged from Cyprinus carpio at approximately 12 Mya. Genome comparison between P. huangchuchieni and the B subgenome of C. carpio provided insights into chromosomal rearrangements during the allotetraploid speciation. With the complete gene set, 17,474 orthologous genes were identified between P. huangchuchieni and C. carpio, providing a broad view of the gene component in the allotetraploid genome, which is critical for future genetic analyses. The high-quality genomic dataset created for P. huangchuchieni provides a diploid progenitor-like reference for the evolution and adaptation of allotetraploid carps.
Fungi form diverse communities and play essential roles in many terrestrial ecosystems, yet there are methodological challenges in taxonomic and phylogenetic placement of fungi from environmental sequences. To address such challenges we investigated spatio-temporal structure of a fungal community using soil metabarcoding with four different sequencing strategies: short amplicon sequencing of the ITS2 region (300–400\ bp) with Illumina MiSeq, Ion Torrent Ion S5, and PacBio RS II, all from the same PCR library, as well as long amplicon sequencing of the full ITS and partial LSU regions (1200–1600\ bp) with PacBio RS II. Resulting community structure and diversity depended more on statistical method than sequencing technology. The use of long-amplicon sequencing enables construction of a phylogenetic tree from metabarcoding reads, which facilitates taxonomic identification of sequences. However, long reads present issues for denoising algorithms in diverse communities. We present a solution that splits the reads into shorter homologous regions prior to denoising, and then reconstructs the full denoised reads. In the choice between short and long amplicons, we suggest a hybrid approach using short amplicons for sampling breadth and depth, and long amplicons to characterize the local species pool for improved identification and phylogenetic analyses.
Characterization of microbial assemblages via environmental DNA metabarcoding is increasingly being used in routine monitoring programs due to its sensitivity and cost-effectiveness. Several programs have been developed recently which infer functional profiles from 16S rRNA gene data using hidden-state prediction (HSP) algorithms. These might offer an economic and scalable alter-native to shotgun metagenomics. To date, HSP-based methods have seen limited use for benthic marine surveys and their performance in these environments remains unevaluated. In this study, 16S rRNA metabarcoding was applied to sediment samples collected at 0 and ≥ 1200 m from Norwegian salmon farms, and three metabolic inference approaches (PAPRICA, PICRUSt2 and TAX4FUN2) evaluated against metagenomics and environmental data. While metabarcoding and metagenomics recovered a comparable functional diversity, the taxonomic composition differed be-tween approaches, with genera richness up to 20× higher for metabarcoding. Comparisons between the sensitivity (highest true positive rates) and specificity (lowest true negative rates) of HSP-based programs in detecting functions found in metagenomics data ranged, respectively, from 0.52 and 0.60 to 0.76 and 0.79. However, little correlation was observed between the relative abundance of their specific functions. Functional beta-diversity of HSP-based data was strongly associated with that of metagenomics (r ≥ 0.86 for PAPRICA and TAX4FUN2) and responded similarly to the impact of fish farm activities. Our results demonstrate that although HSP-based metabarcoding approaches provide a slightly different functional profile than metagenomics, partly due to recovering a distinct community, they represent a cost-effective and valuable tool for characterizing and assessing the effects of fish farming on benthic ecosystems.
The burbot (Lota lota) is the only member of the cod family (Gadidae) that is adapted solely to freshwater. This species shows the widest longitudinal range of freshwater fish in the world. The burbot is a good model for studies on adaptive genome evolution from marine to freshwater environment. However, no high-quality reference genome has been released. Here, the first chromosome-level genome of the burbot was constructed using PacBio long sequencing and Hi-C technology. A total of 95.24 Gb polished PacBio sequences were generated, and the preliminary genome assembly was 575.83 Mb in size with a contig N50 size of 2.15 Mb. The assembled sequences were anchored to 22 pseudo-chromosomes by using the Hi-C data. The final assembled genome after Hi-C correction was 575.92 Mb, with a contig N50 of 2.01 Mb and a scaffold N50 of 22.10 Mb. A total of 22,067 protein-coding genes were predicted, 94.82% of which were functionally annotated. Phylogenetic analyses indicated that burbot diverged with the Atlantic cod about 44.4 million years ago. In addition, 377 putative genes that appear to be under positive selection in burbot were identified. These positively selected genes might adapt to the freshwater environment. These genome data provide an invaluable resource for the ecological and evolutionary study of the order Gadiformes.
The Ocean Barcode Atlas (OBA) is a user friendly web service designed for biologists who wish to explore the biodiversity and biogeography of marine organisms locked in otherwise difficult to mine planetary scale DNA metabarcode datasets. Using just a web browser, a comprehensive picture of the diversity of a taxon or a barcode sequence is visualized graphically on world maps and interactive charts. Interactive results panels allow dynamic threshold adjustments and the display of diversity results in their environmental context measured at the time of sampling (temperature, oxygen, latitude, etc.). Ecological analyses such as alpha and beta-diversity plots are produced via publication quality vector graphics representations. Currently, the Ocean Barcode Altas is deployed online with the i) Tara Oceans eukaryotic 18S-V9 rDNA metabarcodes, ii) Tara Oceans 16S/18S rRNA miTags, and iii) 16S-V4V5 metabarcodes collected during the Malaspina-2010 expedition. Additional prokaryotic or eukaryotic plankton barcode datasets will be added upon availability, given they provide the required complement of barcodes (including raw reads to compute barcode abundance) associated with their contextual environmental variables. Ocean Barcode Atlas is a freely-available web service at: http://oba.mio.osupytheas.fr/ocean-atlas/.
Admixture is a fundamental evolutionary process that has influenced genetic patterns in numerous species. Maximum-likelihood approaches based on allele frequencies and linkage-disequilibrium have been extensively used to infer admixture processes from genome-wide datasets, mostly in human populations. Nevertheless, complex admixture histories, beyond one or two pulses of admixture, remain methodologically challenging to reconstruct. We develop an Approximate Bayesian Computation (ABC) framework to reconstruct highly complex admixture histories from independent genetic markers. We built the software package MetHis to simulate independent SNPs or microsatellites in a two-way admixed population for scenarios with multiple admixture pulses, monotonically decreasing or increasing recurring admixture, or combinations of these scenarios; and draw model-parameter values from prior distributions set by the user. For each simulation, MetHis calculates 24 summary-statistics describing genetic diversity and moments of individual admixture fractions. We coupled MetHis with existing machine-learning ABC algorithms and investigate the admixture history of hybrid populations. Results show that Random-Forest ABC scenario-choice can accurately distinguish most complex admixture scenarios and errors are mainly found in regions of the parameter space where scenarios are highly nested, and, thus, biologically similar. We focus on African American and Barbadian populations as case studies. We find that Neural-Network ABC posterior parameter estimation is accurate and reasonably conservative under complex admixture scenarios. For both admixed populations, we find that monotonically decreasing contributions over time, from Europe and Africa, explain the observed data more accurately than multiple admixture pulses. This approach will allow for reconstructing detailed admixture histories when maximum-likelihood methods are intractable.