Epigenetic processes have taken center stage for the investigation of many biological processes and epigenetic modifications have shown to influence phenotype, morphology and behavioral traits such as stress resistance by affecting gene regulation and expression without altering the underlying genomic sequence. The multiple molecular layers of epigenetics synergistically construct the cell type-specific gene regulatory networks. DNA methylation occurring on the 5’ carbon of cytosines in different genomic sequence contexts is the most studied epigenetic modification. DNA methylation has been shown to provide a molecular record of a large variety of environmental factors, which might be persistent through the entire lifetime of an organisms and even be passed onto the offspring. Animals might display altered phenotypes mediated by epigenetic modifications depending on the developmental stage or the environmental conditions as well as during evolution. Therefore, the analysis of DNA methylation patterns might allow deciphering previous exposures, explaining ecologically relevant phenotypic diversity and predicting evolutionary trajectories enabling accelerated adaption to changing environmental conditions. Despite the explanatory potential of DNA methylation. studies of DNA methylation are still scarce in the field of ecology. This might be at least partly due to the complexity of DNA methylation analysis and the interpretation of the acquired data. In the current issue of Molecular Ecology Resources, Laine and colleagues (2023) provide a detailed summary of guidelines and valuable recommendations for researchers in the field of ecology to avoid common pitfalls and perform interpretable genome-wide DNA methylation analyses.
The global sulfur cycle has implications for human health, climate change, biogeochemistry, and bioremediation. The organosulfur compounds that participate in this cycle not only represent a vast reservoir of sulfur, but are also used by prokaryotes as sources of energy and/or carbon. Closely linked to the inorganic sulfur cycle, it involves the interaction of prokaryotes, eukaryotes, and chemical processes. However, ecological and evolutionary studies of the conversion of organic sulfur compounds are hampered by the poor conservation of the relevant pathways and their variation even within strains of the same species. In addition, several proteins involved in the conversion of sulfonated compounds are related to proteins involved in sulfur dissimilation or turnover of other compounds. Therefore, the enzymes involved in the metabolism of organic sulfur compounds are usually not correctly annotated in public databases. To address this challenge, we have developed HMSS2, a profiled Hidden Markov Model-based tool for rapid annotation and synteny analysis of organic and inorganic sulfur cycle proteins in prokaryotic genomes. Compared to its previous version (HMS-S-S), HMSS2 includes several new features. HMM-based annotation is now supported by non-homology criteria and covers the metabolic pathways of important organosulfur compounds, including dimethylsulfpopropionate, taurine, isethionate, and sulfoquinovose. In addition, the calculation speed has been increased by a factor of four and the available output formats have been extended to include iTol compatible datasets, and customised sequence FASTA files
Next-generation sequencing of pooled samples (Pool-seq) is a popular method to assess genome-wide diversity patterns in natural and experimental populations. However, Pool-seq is associated with specific sources of noise, such as unequal individual contributions. Consequently, using Pool-seq for the reconstruction of evolutionary history has remained underexplored. Here we describe a novel Approximate Bayesian Computation (ABC) method to infer demographic history, explicitly modeling Pool-seq sources of error. By jointly modeling Pool-seq data, demographic history and the effects of selection due to barrier loci, we obtain estimates of demographic history parameters accounting for technical errors associated with Pool-seq. Our ABC approach is computationally efficient as it relies on simulating subsets of loci (rather than the whole-genome), and on using relative summary statistics and relative model parameters. Our simulation study results indicate Pool-seq data allows distinction between general scenarios of ecotype formation (single versus parallel origin), and to infer relevant demographic parameters (e.g., effective sizes, split times). We exemplify the application of our method to Pool-seq data from the rocky-shore gastropod Littorina saxatilis, sampled on a narrow geographical scale at two Swedish locations where two ecotypes (Wave and Crab) are found. Our model choice and parameter estimates show that ecotypes formed before colonization of the two locations (i.e., single origin) and are maintained despite gene flow. These results indicate that demographic modeling and inference can be successful based on pool-sequencing using ABC, contributing to the development of suitable null models that allow for a better understanding of the genetic basis of divergent adaptation.
1. Metabarcoding of environmental DNA (eDNA) has recently improved our understanding of biodiversity patterns in marine and terrestrial ecosystems. However, the complexity of these data prevents current methods to extract and analyze all the relevant ecological information they contain. Therefore, ecological modeling could greatly benefit from new methods providing better dimensionality reduction and clustering. 2. Here we present two new deep learning-based methods that combine different types of neural networks to ordinate eDNA samples and visualize ecosystem properties in a two-dimensional space: the first is based on variational autoencoders (VAEs) and the second on deep metric learning (DML). The strength of our new methods lies in the combination of several inputs: the number of sequences found for each molecular operational taxonomic unit (MOTU), together with the genetic sequence information of each detected MOTU within an eDNA sample. 3. Using three different datasets, we show that our methods represent well three different ecological indicators in a two-dimensional latent space: MOTU richness per sample, sequence α-diversity per sample, and sequence ꞵ-diversity between samples. We show that our nonlinear methods are better at extracting features from eDNA datasets while avoiding the major biases associated with eDNA. Our methods outperform traditional dimension reduction methods such as Principal Component Analysis, t-distributed Stochastic Neighbour Embedding, and Uniform Manifold Approximation and Projection for dimension reduction. 4. Our results suggest that neural networks provide a more efficient way of extracting structure from eDNA metabarcoding data, thereby improving their ecological interpretation and thus biodiversity monitoring.
Some of the most vexing problems of deep level relationship that remain in angiosperms involve the superrosids. The superrosid clade contains a quarter of all angiosperm species, with 18 orders in three subclades (Vitales, Saxifragales and core rosids) exhibiting remarkable morphological and ecological diversity. To help resolve deep-level relationships, we constructed a high-quality chromosome-level genome assembly for Tiarella polyphylla (Saxifragaceae) thus providing broader genomic representation of Saxifragales. Whole genome microsynteny analysis of superrosids showed that Saxifragales shared more synteny clusters with core rosids than Vitales, further supporting Saxifragales as more closely related with core rosids. To resolve the ordinal phylogeny of superrosids, we screened 122 single copy nuclear genes from genomes of 36 species, representing all 18 superrosid orders. Vitales were recovered as sister to all other superrosids (Saxifragales + core rosids). Our data suggest dramatic differences in relationships compared to earlier studies within core rosids. Fabids should be restricted to the nitrogen-fixing clade, while Picramniales, the Celastrales-Malpighiales (CM) clade, Huerteales, Oxalidales, Sapindales, Malvales and Brassicales formed an “expanded” malvid clade. The Celastrales-Oxalidales-Malpighiales (COM) clade (sensu APG IV) was not monophyletic. Crossosomatales, Geraniales, Myrtales and Zygophyllales did not belong to either of our well-supported malvids or fabids. There is strong discordance between nuclear and plastid phylogenetic hypotheses for superrosid relationships; we show that this is best explained by a combination of incomplete lineage sorting and ancient reticulation.
Morphological identification of cnidarian species can be difficult throughout all life stages due to the lack of distinct morphological characters. Moreover, in some cnidarian taxa genetic markers are not fully informative, and in these cases combinations of different markers or additional morphological verifications may be required. Proteomic fingerprinting based on MALDI-TOF mass spectra was previously shown to provide reliable species identification in different metazoans including some cnidarian taxa. For the first time, we tested the method across four cnidarian classes (Staurozoa, Scyphozoa, Anthozoa, Hydrozoa) and included different scyphozoan life-history stages (polyp, ephyra, medusa) into our dataset. Our results revealed reliable species identification based on MALDI-TOF mass spectra across all taxa with species-specific clusters for all 23 analyzed species. In addition, proteomic fingerprinting was successful for distinguishing developmental stages, still by retaining a species specific signal. Furthermore, we identified the impact of different salinities in different regions (North Sea and Baltic Sea) on proteomic fingerprints to be negligible. In conclusion, the effects of environmental factors and developmental stages on proteomic fingerprints seem to be low in cnidarians. This would allow using reference libraries built up entirely of adult or cultured cnidarian specimens for the identification of their juvenile stages or specimens from different geographic regions in future biodiversity assessment studies.
Species detection using eDNA is revolutionizing global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of mitogenomic sequence information to evaluate the effectiveness of such data to detect and identify taxa. We created the Oregon Biodiversity Genome Project to create a database of complete, nearly error-free mitogenomic sequences for all of Oregon’s fishes. We have successfully assembled the complete mitogenomes of 313 specimens of freshwater, anadromous, and estuarine fishes representing 24 families, 55 genera, and 128 species and lineages. Comparative analyses of these sequences illustrate that many regions of the mitogenome are taxonomically informative, that the short (~150 bp) mitochondrial “barcode” regions typically used for eDNA assays do not consistently diagnose for species, and that complete single or multiple genes of the mitogenome are preferable for identifying Oregon’s fishes. This project provides a blueprint for other researchers to follow as they build regional databases, illustrates the taxonomic value and limits of complete mitogenomic sequences, and offers clues as to how current eDNA assays and environmental genomics methods of the future can best leverage this information.
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified 3 mutations, but overall required more work and had higher rates of false positives and false negatives. We, thus, recommend its application if most of the mutations are expected to be identified or in case of experiment-specific biases. Both methods concordantly showed that guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates. Similarly, low estimates were obtained for two other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine-learning approaches.
Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria, and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one amongst the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths, and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems, and particular attributes of thirty-one amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.
Single-nucleotide polymorphism (SNP) analyses are a powerful tool for population genetics, pedigree reconstruction and phenotypic trait mapping. SNPs could also be useful for sexing individuals in species with reduced sexual dimorphism, yet this possibility remains poorly explored. Here, we develop a novel protocol for molecular sexing of birds based on the detection of unique Z- and W-linked SNP markers. Our method is based on the identification of two unique loci, one in each sexual chromosome. Individuals are considered males when they are heterozygotic for the Z-linked SNP and females when they are homozygote for the Z-linked SNP and have the W-linked SNP. We validated the method in the Jackdaw (Corvus monedula), a species whose reduced sexual dimorphism makes it difficult to sex individuals in the wild. We assessed the reliability of the method with 36 individuals of known sex, and found that their sex was correctly assigned in 100% of cases. The sex-linked markers also proved to be widely applicable to discriminate males and females from a sample of 927 genotyped individuals of different maturity stages with an accuracy of 99.5%. Given that SNP markers are increasingly used in quantitative genetic analyses of wild populations, the approach we propose has a great potential to be integrated into broader genetic research programmes without the need of additional sexing techniques.
Inserts of DNA from extranuclear sources, such as organelles and microbes, are common in eukaryote nuclear genomes. However, sequence similarity between the nuclear and extranuclear DNA, and a history of multiple insertions, make the assembly of these regions challenging. Consequently, the number, sequence, and location of these vagrant DNAs cannot be reliably inferred from the genome assemblies of most organisms. We introduce two statistical methods to estimate the abundance of nuclear inserts even in the absence of a nuclear genome assembly. The first (intercept method) only requires low-coverage (<1x) sequencing data, as commonly generated for population studies of organellar and ribosomal DNAs. The second method additionally requires that a subset of the individuals carry extra-nuclear DNA with diverged genotypes. We validated our intercept method using simulations and by re-estimating the frequency of human NUMTs (nuclear mitochondrial inserts). We then applied it to the grasshopper Podisma pedestris, exceptional for both its large genome size and reports of numerous NUMT inserts, estimating that NUMTs make up 0.056% of the nuclear genome, equivalent to >500 times the mitochondrial genome size. We also re-analysed a museomics dataset of the parrot Psephotellus varius, obtaining an estimate of only 0.0043%, in line with reports from other species of bird. Our study demonstrates the utility of low-coverage high-throughput sequencing data for the quantification of nuclear vagrant DNAs. Beyond quantifying organellar inserts, these methods could also be used on endosymbiont-derived sequences. We provide an R implementation of our methods called “vagrantDNA” and code to simulate test datasets.
Understanding landscape connectivity has become a global priority for mitigating the impact of landscape fragmentation on biodiversity. Link-based methods traditionally rely on relating pairwise genetic distance between individuals or demes to their landscape distance (e.g., geographic distance, cost distance). In this study, we present an alternative to conventional statistical approaches to refine cost surfaces by adapting the Gradient Forest (GF) approach to produce a resistance surface. Used in community ecology, GF is an extension of random forest (RF), and has been implemented in genomic studies to model species genetic offset under future climatic scenarios. By design, this adapted method, resGF, has the ability to handle multiple environmental predicators and is not subjected to traditional assumptions of linear models such as independence, normality and linearity. Using genetic simulations, resGF performance was compared to other published methods. In univariate scenarios, resGF was able to distinguish the true surface contributing to genetic diversity among competing surfaces better than the compared methods. In multivariate scenarios, the GF approach performed similarly to the other RF-based approach using least-cost transect analysis (LCTA). Additionally, two worked examples are provided using two previously published datasets. This machine learning algorithm has the potential to improve our understanding of landscape connectivity and can inform long-term biodiversity conservation strategies.
Age is an essential trait for understanding the ecology and management of wildlife. A conventional method of estimating age in wild animals is counting annuli formed in the cementum of teeth. This method has been used in bears despite some disadvantages, such as high invasiveness and the requirement for experienced observers. In this study, we established a novel age estimation method based on DNA methylation levels using blood collected from 49 brown bears of known ages living in both captivity and the wild. We performed bisulfite pyrosequencing and obtained methylation levels at 39 cytosine-phosphate-guanine (CpG) sites adjacent to 12 genes. The methylation levels of CpGs adjacent to four genes showed a significant correlation with age. The best model was based on DNA methylation levels at just four CpG sites adjacent to a single gene, SLC12A5, and it had high accuracy with a mean absolute error of 1.3 years and median absolute error of 1.0 year after leave-one-out cross-validation. This model represents the first epigenetic method of age estimation in brown bears, which provides benefits over tooth-based methods, including high accuracy, less invasiveness, and a simple procedure. Our model has the potential for application to other bear species, which will greatly improve ecological research, conservation, and management.
A large part of the soil protist diversity is missed in metabarcoding studies based on 0.25 g of soil environmental DNA (eDNA) and universal primers due to ca. 80 % co-amplification of non-target plants, animals and fungi. To overcome this problem, enrichment of the substrate used for eDNA extraction is an easyly implemented option but its effect has not yet been tested. In this study, we evaluated the effect of a 150 µm mesh size filtration and sedimentation method to improve the recovery of protist eDNA, while reducing the co-extraction of plant, animal and fungal eDNA, using a set of contrasted forest and alpine soils from La Réunion, Japan, Spain and Switzerland. Biodiversity of the whole eukaryotic community was estimated with V4 18S rRNA metabarcoding and classical amplicon sequence variant calling. A 2-3-fold enrichment in shelled protists (Euglyphida, Arcellinida and Chrysophyceae) was observed at the sample level with the proposed method, with, at the same time, a 2-fold depletion of Fungi and a 3-fold depletion of Embryophyceae. Protist alpha diversity was slightly lower in filtered samples due to reduced coverage in Variosea and Sarcomonadea, but significant differences were observed in only one region. Beta diversity was mostly impacted by region and habitat, and explained the same variance in bulk soil and filtered samples. The increase resolution in the soil protist diversity provided by the filtration-sedimentation method is a strong argument to include it in the standard preparation of any future soil for protist eDNA metabarcoding studies.
In the face of global biodiversity declines, surveys of beneficial and antagonistic arthropod diversity as well as the ecological services that they provide are increasingly important in both natural and agro-ecosystems. Conventional survey methods used to monitor these communities often require extensive taxonomic expertise and are time-intensive, potentially limiting their application in industries such as agriculture, where arthropods often play a critical role in productivity (e.g. pollinators, pests and predators). Environmental DNA (eDNA) metabarcoding of a novel substrate, crop flowers, may offer an accurate and high throughput alternative to aid in the detection managed and unmanaged arthropod taxa (e.g. flower-visiting insects and potential pollinators). Here, we compared the arthropod communities detected with eDNA metabarcoding of flowers, from an agricultural species (Persea americana - ‘Hass’ avocado), with two conventional survey techniques; Digital Video Recording (DVR) devices and pan traps. In total, 80 eDNA flower samples, 96 hours of DVRs and 48 pan trap samples were collected. Across the three methods, 49 arthropod families were identified, of which 12 were unique to the eDNA dataset. Alpha diversity levels did not differ across the three survey methods although taxonomic composition varied significantly, with only 12% of arthropod families found to be common across all three methods. This study demonstrates that eDNA metabarcoding of flowers to detect visiting arthropods, although in a developmental stage, can complement traditional survey methods and increase the diversity of taxa detected with implications for both natural and agro-ecosystems.
Genotype environment association (GEA) studies have the potential to identify the genetic basis of local adaptation in natural populations. Specifically, GEA approaches look for a correlation between allele frequencies and putatively selective features of the environment. Genetic markers with extreme evidence of correlation with the environment are presumed to be tagging the location of alleles that contribute to local adaptation. In this study, we propose a new method for GEA studies called the weighted-Z analysis (WZA) that combines information from closely linked sites into analysis windows in a way that was inspired by methods for calculating FST. We analyze simulations modelling local adaptation to heterogeneous environments to compare the WZA with existing methods. In the majority of cases we tested, the WZA either outperformed single-SNP based approaches or performed similarly. In particular, the WZA outperformed individual SNP approaches when a small number of individuals or demes was sampled. We apply the WZA to previously published data from lodgepole pine and identified candidate loci that were not found in the original study.
There is growing interest in the role of structural variants (SVs) as drivers of local adaptation and speciation. From a biodiversity genomics perspective, the characterisation of genome-wide SVs provides an exciting opportunity to complement single nucleotide polymorphisms (SNPs). However, little is known about the impacts of SV discovery and genotyping strategies on the characterisation of genome-wide SV diversity within and among populations. Here, we explore a near whole-species resequence dataset, and long-read sequence data for a subset of highly represented individuals in the critically endangered kākāpō (Strigops habroptilus). We demonstrate that even when using a highly contiguous reference genome, different discovery and genotyping strategies can significantly impact the type, size and location of SVs characterised genome-wide. Further, we found that the mean number of SVs in each of two kākāpō lineages differed both within and across generations. These combined results suggest that genome-wide characterisation of SVs remains challenging at the population-scale. We are optimistic that increased accessibility to long-read sequencing and advancements in bioinformatic approaches including multi-reference approaches like genome graphs will alleviate at least some of the challenges associated with resolving SV characteristics below the species level. In the meantime, we address caveats, highlight considerations, and provide recommendations for the characterization of genome-wide SVs in biodiversity genomic research.