We present a complete portable pipeline for sequencing and analysis of environmental metagenomes in less than a day. This unprecedented development was possible due to the conjunction of state-of-the art experimental and computational advances: a portable laboratory suitable for DNA extraction and sequencing with nanopore technology.The powerful metagenomic analysis pipeline SqueezeMeta, capable to provide a complete analysis in a few hours and using scarce computational resources. Finally, tools for the automatic inspection of the results via a graphical user interface, that can be coupled to a web server to allow remote visualization of data (SQMtools and SQMxplore). We tested the feasibility of our approach in the sequencing of the microbiota associated to volcanic rocks in La Palma, Canary Islands. Also, we did a two-day sampling campaign of marine waters in which the results obtained the first day guided the experimental design of the second day. We demonstrate that it is possible to generate metagenomic information in less than one day, making it feasible to obtain taxonomic and functional profiles fast and efficiently, even in field conditions. This capacity can be used in the further to perform real-time functional and taxonomic profiling of microbial communities in remote areas
Mesozooplankton is a very diverse group of small animals ranging in size from 0.2 to 20 mm not able to swim against ocean currents. It is a key component of pelagic ecosystems through its roles in the trophic networks and the biological carbon pump. Traditionally studied through microscopes, recent methods have been however developed to rapidly acquire large amounts of data (morphological, molecular) at the individual scale, making it possible to study mesozooplankton using a trait-based approach. Here, combining quantitative imaging with metabarcoding time-series data obtained in the Sargasso Sea at the Bermuda Atlantic Time-series Study (BATS) site, we showed that organisms’ transparency might be an important trait to also consider regarding mesozooplankton impact on carbon export, contrary to the common assumption that just size is the master trait directing most mesozooplankton-linked processes. Three distinct communities were defined based on taxonomic composition, and succeeded one another throughout the study period, with changing levels of transparency among the community. A co-occurrences’ network was built from metabarcoding data revealing six groups of taxa. These were related to changes in the functioning of the ecosystem and/or in the community’s morphology. The importance of Diel Vertical Migration at BATS was confirmed by the existence of a group made of taxa known to be strong migrators. Finally, we assessed if metabarcoding can provide a quantitative approach to biomass and/or abundance of certain taxa. Knowing more about mesozooplankton diversity and its impact on ecosystem functioning would allow to better represent them in biogeochemical models.
Autopolyploidy is quite common in most clades of eukaryotes. The emergence of sequence-based genotyping methods with individual and marker tags enables now confident allele dosage, overcoming the main obstacle to the democratization of the population genetic approaches when studying ecology and evolution of autopolyploid populations and species. Reproductive modes, including clonality, selfing and allogamy, have deep consequences on the ecology and evolution of population and species. Analysing genetic diversity and its dynamics over generations is one efficient way to infer the relative importance of clonality, selfing and allogamy in populations. GENAPOPOP is a user-friendly solution to compute the specific corpus of population genetic indices, including indices about genotypic diversity, needed to analyse partially clonal, selfed and allogamous polysomic populations genotyped with confident allele dosage. It also easily provides the posterior probabilities of quantitative reproductive modes in autopolyploid populations genotyped at two-time steps and a graphical representation of the minimum spanning trees of the genetic distances between polyploid individuals, facilitating the interpretation of the genetic coancestry between individuals in hierarchically structured populations. GENAPOPOP complements the previously existing solutions, including SPAGEDI and POLYGENE, to use genotypings to study the ecology and evolution of autopolyploid populations. It was specially developed with a simple graphical interface and workflow, and comes with a simulator to facilitate practical course and teaching of population genetics for autopolyploid populations.
Age is necessary information for the study of life history of wild animals. A general method to estimate the age of odontocetes is counting dental growth layer groups (GLGs). However, this method is highly invasive as it requires the capture and handling of individuals to collect their teeth. Recently, the development of DNA-based age estimation methods has been actively studied as an alternative to such invasive methods, of which many have used biopsy samples. However, if DNA-based age estimation can be developed from fecal samples, age estimation can be performed without touching or disrupting individuals, thus establishing an entirely non-invasive method. We developed an age estimation model using the methylation rate of two gene regions, GRIA2 and CDKN2A, measured through methylation-sensitive high-resolution melting (MS-HRM) from fecal samples of wild Indo-Pacific bottlenose dolphins (Tursiops aduncus). The age of individuals was known through conducting longitudinal individual identification surveys underwater. Methylation rates were quantified from 36 samples. Both gene regions showed a significant correlation between age and methylation rate. The age estimation model was constructed based on the methylation rates of both genes which achieved sufficient accuracy (after LOOCV: MAE = 5.08, R2 = 0.34) for the ecological studies of the Indo-Pacific bottlenose dolphins, with a lifespan of 40-50 years. This is the first study to report the use of non-invasive fecal samples to estimate the age of marine mammals.
Our limited knowledge about the ecological drivers of global arthropod decline highlights the urgent need for more effective biodiversity monitoring approaches. Monitoring of arthropods is commonly performed using passive trapping devices, which reliably recover diverse communities, but provide little ecological information on the sampled taxa. Especially the manifold interactions of arthropods with plants are barely understood. A promising strategy to overcome this shortfall is environmental DNA (eDNA) metabarcoding of arthropods from plant material they have interacted with. However, the accuracy of this approach has not been sufficiently tested. In four experiments, we exhaustively test the comparative performance of plant-derived eDNA from surface washes of plants and homogenized plant material against traditional monitoring approaches. We show that the recovered communities of plant-derived eDNA and traditional approaches only partly overlap, with eDNA recovering various additional cryptic taxa. This suggests eDNA as a useful complementary tool to traditional monitoring. Despite the differences in recovered taxa, estimates of community α- and β-diversity between both approaches are well correlated, highlighting the utility of eDNA as a broad scale tool for community monitoring. Last, eDNA outperforms traditional approaches in the recovery of plant-specific arthropod communities. Unlike traditional monitoring, eDNA revealed fine-scaled community differentiation between individual plants and even within plant compartments. Especially specialized herbivores are better recovered with eDNA. Our results highlight the value of plant derived eDNA analysis for large-scale biodiversity assessments that include information about community level interactions.
Whole genome sequencing data allow survey of variation from across the genome, reducing the constraint of balancing genome sub-sampling with recombination rates and linkage between sampled markers and target loci. As sequencing costs decrease, low coverage whole genome sequencing of pooled or indexed-individual samples is commonly utilized to identify loci associated with phenotypes or environmental axes in non-model organisms. There are, however, relatively few publicly available bioinformatic pipelines designed explicitly to analyze these types of data, and fewer still that process the raw sequencing data, provide useful metrics of quality control, and then execute analyses. Here, we present an updated version of a bioinformatics pipeline called POOLPARTY2 that can effectively handle either pooled or indexed DNA samples and includes new features to improve computational efficiency. Using simulated data, we demonstrate the ability of our pipeline to recover segregating variants, estimate their allele frequencies accurately, and identify genomic regions harboring loci under selection. Based on the simulated data set, we benchmark the efficacy of our pipeline with another bioinformatic suite, ANGSD, and illustrate the compatibility and complementarity of these suites by using ANGSD to generate genotype likelihoods as input for identifying linkage outlier regions using alignment files and variants provided by POOLPARTY2. Finally, we apply our updated pipeline to an empirical dataset of low coverage whole genomic data from uncurated population samples of Columbia River steelhead trout (Oncorhynchus mykiss), results from which demonstrate the genomic impacts of decades of artificial selection in a prominent hatchery stock.
Changes in telomere length are increasingly used to indicate species’ response to environmental stress across diverse taxa. Despite this broad use, few studies have explored telomere length in plants. However, rapid advances in sequencing approaches and bioinformatic tools now allow estimation of telomere length using whole genome sequencing (WGS) data. Thus, evaluation of new approaches for measuring telomere length in plants are needed. Traditionally, telomere length has been quantified using quantitative polymerase chain reaction (qPCR). While WGS has been extensively used in humans, no study to date has compared the effectiveness of WGS in estimating telomere length in plants relative to traditional qPCR approaches. In this study, we use one hundred Populus clones re-sequenced using short-read Illumina sequencing to quantify telomere length using three different bioinformatic approaches, Computel, K-seek, and TRIP, in addition to qPCR. Overall, telomere length estimates varied across different bioinformatic approaches, but were highly correlated across methods for individual genotypes. A positive correlation was observed between WGS estimates and qPCR, however, Computel estimates exhibited the greatest correlation. Computel incorporates genome coverage into telomere length calculations, suggesting that genome coverage is likely important to telomere length quantification when using WGS data. Overall, telomere estimates from WGS provided greater precision and accuracy of telomere length estimates relative to qPCR. The findings suggest WGS is a promising approach for assessing telomere length, and as the field of telomere ecology evolves may provide added value to assaying response to biotic and abiotic environments for plants needed to accelerate plant breeding and conservation management.
Epigenetic processes have taken center stage for the investigation of many biological processes and epigenetic modifications have shown to influence phenotype, morphology and behavioral traits such as stress resistance by affecting gene regulation and expression without altering the underlying genomic sequence. The multiple molecular layers of epigenetics synergistically construct the cell type-specific gene regulatory networks. DNA methylation occurring on the 5’ carbon of cytosines in different genomic sequence contexts is the most studied epigenetic modification. DNA methylation has been shown to provide a molecular record of a large variety of environmental factors, which might be persistent through the entire lifetime of an organisms and even be passed onto the offspring. Animals might display altered phenotypes mediated by epigenetic modifications depending on the developmental stage or the environmental conditions as well as during evolution. Therefore, the analysis of DNA methylation patterns might allow deciphering previous exposures, explaining ecologically relevant phenotypic diversity and predicting evolutionary trajectories enabling accelerated adaption to changing environmental conditions. Despite the explanatory potential of DNA methylation. studies of DNA methylation are still scarce in the field of ecology. This might be at least partly due to the complexity of DNA methylation analysis and the interpretation of the acquired data. In the current issue of Molecular Ecology Resources, Laine and colleagues (2023) provide a detailed summary of guidelines and valuable recommendations for researchers in the field of ecology to avoid common pitfalls and perform interpretable genome-wide DNA methylation analyses.
The global sulfur cycle has implications for human health, climate change, biogeochemistry, and bioremediation. The organosulfur compounds that participate in this cycle not only represent a vast reservoir of sulfur, but are also used by prokaryotes as sources of energy and/or carbon. Closely linked to the inorganic sulfur cycle, it involves the interaction of prokaryotes, eukaryotes, and chemical processes. However, ecological and evolutionary studies of the conversion of organic sulfur compounds are hampered by the poor conservation of the relevant pathways and their variation even within strains of the same species. In addition, several proteins involved in the conversion of sulfonated compounds are related to proteins involved in sulfur dissimilation or turnover of other compounds. Therefore, the enzymes involved in the metabolism of organic sulfur compounds are usually not correctly annotated in public databases. To address this challenge, we have developed HMSS2, a profiled Hidden Markov Model-based tool for rapid annotation and synteny analysis of organic and inorganic sulfur cycle proteins in prokaryotic genomes. Compared to its previous version (HMS-S-S), HMSS2 includes several new features. HMM-based annotation is now supported by non-homology criteria and covers the metabolic pathways of important organosulfur compounds, including dimethylsulfpopropionate, taurine, isethionate, and sulfoquinovose. In addition, the calculation speed has been increased by a factor of four and the available output formats have been extended to include iTol compatible datasets, and customised sequence FASTA files
A new method is developed to estimate the contemporary effective population size (Ne) from linkage disequilibrium between SNPs without information on their location, which is the usual scenario in non-model species. The general theory of linkage disequilibrium is extended to include the contribution of full-sibs to the measure of LD, leading naturally to the estimation of Ne in monogamous and polygamous mating systems, as well as in multiparous species, and non-random distributions of full-sib family size due to selection or other causes. The prediction of confidence intervals for Ne estimates was solved using a small artificial neural network trained on a dataset of over 105 simulation results. The method, implemented in a user-friendly and fast software (currentNe) is able to estimate Ne even in problematic scenarios with large population sizes or small sample sizes, and provides confidence intervals that are more consistent than parametric methods or resampling.
Advances in sequencing technologies and declining costs are increasing the accessibility of large-scale biodiversity genomic datasets. To maximise the impact of these data, a careful, considered approach to data management is essential. However, challenges associated with the management of such datasets remain, exacerbated by uncertainty among the research community as to what constitutes best practices. As an interdisciplinary team with diverse data management experience, we recognise the growing need for guidance on comprehensive data management practices that minimise the risks of data loss, maximise efficiency for stand-alone projects, enhance opportunities for data reuse, facilitate Indigenous data sovereignty and uphold the FAIR and CARE Guiding Principles. Here, we describe four fictional personas reflecting user experiences with data management to identify data management challenges across the biodiversity genomics research ecosystem. We then use these personas to demonstrate realistic considerations, compromises, and actions for biodiversity genomic data management. We also launch the Biodiversity Genomics Data Management Hub (https://genomicsaotearoa.github.io/data-management-resources/), containing tips, tricks and resources to support biodiversity genomics researchers, especially those new to data management, in their journey towards best practice. The Hub also provides an opportunity for those biodiversity researchers whose expertise lies beyond genomics and are keen to advance their data management journey. We aim to support the biodiversity genomics community in embedding data management throughout the research lifecycle to maximise research impact and outcomes.
Next-generation sequencing of pooled samples (Pool-seq) is a popular method to assess genome-wide diversity patterns in natural and experimental populations. However, Pool-seq is associated with specific sources of noise, such as unequal individual contributions. Consequently, using Pool-seq for the reconstruction of evolutionary history has remained underexplored. Here we describe a novel Approximate Bayesian Computation (ABC) method to infer demographic history, explicitly modeling Pool-seq sources of error. By jointly modeling Pool-seq data, demographic history and the effects of selection due to barrier loci, we obtain estimates of demographic history parameters accounting for technical errors associated with Pool-seq. Our ABC approach is computationally efficient as it relies on simulating subsets of loci (rather than the whole-genome), and on using relative summary statistics and relative model parameters. Our simulation study results indicate Pool-seq data allows distinction between general scenarios of ecotype formation (single versus parallel origin), and to infer relevant demographic parameters (e.g., effective sizes, split times). We exemplify the application of our method to Pool-seq data from the rocky-shore gastropod Littorina saxatilis, sampled on a narrow geographical scale at two Swedish locations where two ecotypes (Wave and Crab) are found. Our model choice and parameter estimates show that ecotypes formed before colonization of the two locations (i.e., single origin) and are maintained despite gene flow. These results indicate that demographic modeling and inference can be successful based on pool-sequencing using ABC, contributing to the development of suitable null models that allow for a better understanding of the genetic basis of divergent adaptation.
1. Metabarcoding of environmental DNA (eDNA) has recently improved our understanding of biodiversity patterns in marine and terrestrial ecosystems. However, the complexity of these data prevents current methods to extract and analyze all the relevant ecological information they contain. Therefore, ecological modeling could greatly benefit from new methods providing better dimensionality reduction and clustering. 2. Here we present two new deep learning-based methods that combine different types of neural networks to ordinate eDNA samples and visualize ecosystem properties in a two-dimensional space: the first is based on variational autoencoders (VAEs) and the second on deep metric learning (DML). The strength of our new methods lies in the combination of several inputs: the number of sequences found for each molecular operational taxonomic unit (MOTU), together with the genetic sequence information of each detected MOTU within an eDNA sample. 3. Using three different datasets, we show that our methods represent well three different ecological indicators in a two-dimensional latent space: MOTU richness per sample, sequence α-diversity per sample, and sequence ꞵ-diversity between samples. We show that our nonlinear methods are better at extracting features from eDNA datasets while avoiding the major biases associated with eDNA. Our methods outperform traditional dimension reduction methods such as Principal Component Analysis, t-distributed Stochastic Neighbour Embedding, and Uniform Manifold Approximation and Projection for dimension reduction. 4. Our results suggest that neural networks provide a more efficient way of extracting structure from eDNA metabarcoding data, thereby improving their ecological interpretation and thus biodiversity monitoring.
Some of the most vexing problems of deep level relationship that remain in angiosperms involve the superrosids. The superrosid clade contains a quarter of all angiosperm species, with 18 orders in three subclades (Vitales, Saxifragales and core rosids) exhibiting remarkable morphological and ecological diversity. To help resolve deep-level relationships, we constructed a high-quality chromosome-level genome assembly for Tiarella polyphylla (Saxifragaceae) thus providing broader genomic representation of Saxifragales. Whole genome microsynteny analysis of superrosids showed that Saxifragales shared more synteny clusters with core rosids than Vitales, further supporting Saxifragales as more closely related with core rosids. To resolve the ordinal phylogeny of superrosids, we screened 122 single copy nuclear genes from genomes of 36 species, representing all 18 superrosid orders. Vitales were recovered as sister to all other superrosids (Saxifragales + core rosids). Our data suggest dramatic differences in relationships compared to earlier studies within core rosids. Fabids should be restricted to the nitrogen-fixing clade, while Picramniales, the Celastrales-Malpighiales (CM) clade, Huerteales, Oxalidales, Sapindales, Malvales and Brassicales formed an “expanded” malvid clade. The Celastrales-Oxalidales-Malpighiales (COM) clade (sensu APG IV) was not monophyletic. Crossosomatales, Geraniales, Myrtales and Zygophyllales did not belong to either of our well-supported malvids or fabids. There is strong discordance between nuclear and plastid phylogenetic hypotheses for superrosid relationships; we show that this is best explained by a combination of incomplete lineage sorting and ancient reticulation.
Morphological identification of cnidarian species can be difficult throughout all life stages due to the lack of distinct morphological characters. Moreover, in some cnidarian taxa genetic markers are not fully informative, and in these cases combinations of different markers or additional morphological verifications may be required. Proteomic fingerprinting based on MALDI-TOF mass spectra was previously shown to provide reliable species identification in different metazoans including some cnidarian taxa. For the first time, we tested the method across four cnidarian classes (Staurozoa, Scyphozoa, Anthozoa, Hydrozoa) and included different scyphozoan life-history stages (polyp, ephyra, medusa) into our dataset. Our results revealed reliable species identification based on MALDI-TOF mass spectra across all taxa with species-specific clusters for all 23 analyzed species. In addition, proteomic fingerprinting was successful for distinguishing developmental stages, still by retaining a species specific signal. Furthermore, we identified the impact of different salinities in different regions (North Sea and Baltic Sea) on proteomic fingerprints to be negligible. In conclusion, the effects of environmental factors and developmental stages on proteomic fingerprints seem to be low in cnidarians. This would allow using reference libraries built up entirely of adult or cultured cnidarian specimens for the identification of their juvenile stages or specimens from different geographic regions in future biodiversity assessment studies.
Palaeolimnological records provide valuable information about how phytoplankton respond to long-term drivers of environmental change. Traditional palaeolimnological tools such as microfossils and pigments are restricted to taxa that leave sub-fossil remains, and a method that can be applied to the wider community is required. Sedimentary DNA (sedDNA), extracted from lake sediment cores, shows promise in palaeolimnology, but validation against data from long-term monitoring of lake water is necessary to enable its development as a reliable record of past phytoplankton communities. To address this need, 18S rRNA gene amplicon sequencing was carried out on lake sediments from a core collected from Esthwaite Water (English Lake District) spanning ~105 years. This sedDNA record was compared with concurrent long-term microscopy-based monitoring of phytoplankton in the surface water. Broadly comparable trends were observed between the datasets, with respect to the diversity and relative abundance and occurrence of chlorophytes, dinoflagellates, ochrophytes and bacillariophytes. Up to 20% of genera identified in the microscopy record were also detected using sedDNA, and sedDNA revealed a previously undetected community of phytoplankton. However, a substantial proportion of genera identified by microscopy were not detected using sedDNA, highlighting the current limitations of the technique that require further development such as reference database coverage. These results suggest that sedDNA can be used as an effective record of past phytoplankton communities, at least over timescales of less than 100 years, but the taphonomic processes which may affect its reliability, such as the extent and rate of deposition and DNA degradation, require further research.
Species detection using eDNA is revolutionizing global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of mitogenomic sequence information to evaluate the effectiveness of such data to detect and identify taxa. We created the Oregon Biodiversity Genome Project to create a database of complete, nearly error-free mitogenomic sequences for all of Oregon’s fishes. We have successfully assembled the complete mitogenomes of 313 specimens of freshwater, anadromous, and estuarine fishes representing 24 families, 55 genera, and 128 species and lineages. Comparative analyses of these sequences illustrate that many regions of the mitogenome are taxonomically informative, that the short (~150 bp) mitochondrial “barcode” regions typically used for eDNA assays do not consistently diagnose for species, and that complete single or multiple genes of the mitogenome are preferable for identifying Oregon’s fishes. This project provides a blueprint for other researchers to follow as they build regional databases, illustrates the taxonomic value and limits of complete mitogenomic sequences, and offers clues as to how current eDNA assays and environmental genomics methods of the future can best leverage this information.