Molecular Ecology Resources - Authorea

by author

by title

by keyword

Haplotype-phased and chromosome-level assembly of Puccinia polysora, a giga-scale fun...

junmin Liang

and 10 more

June 13, 2022

Rust fungi are characterized by large genomes with high repeat content, and have two haploid nuclei in most life stages, which makes achieving high-quality genome assemblies challenging. Here, we describe a pipeline using HiFi reads and Hi-C data to assemble a gigabase-sized fungal pathogen, Puccinia polysora f.sp. zeae, to haplotype-phased and chromosome-scale. The final assembled genome is 1.71 Gbp, with ~850 Mbp and 18 chromosomes in each haplotype, being currently the largest fungal genome assembled to chromosome scale. Transcript-based annotation identified 47,512 genes with a similar number for each haplotype. A high level of interhaplotype variation was found with 10% haplotype-specific BUSCO genes, 5.8 SNPs/kbp, and structural variation accounting for 3% of the genome size. The P. polysora genome displayed over 85% repeat content, with genome-size expansion, gene losses and gene family expansions suggested by multiple copies of species-specific orthogroups. Interestingly, these features did not affect overall synteny with other Puccinia species with smaller genomes. Fine-time-point transcriptomics revealed seven clusters of co-expressed secreted proteins that are conserved between two haplotypes. The fact that candidate effectors interspersed with all genes indicated the absence of a “two-speed genome” evolution in P. polysora. Genome resequencing of 79 additional isolates revealed a clonal population structure of P. polysora in China with low geographic differentiation. Nevertheless, a minor population drifted from the major population by having mutations on secreted proteins including AvrRppC, indicating the ongoing evolution and population differentiation. The high-quality assembly provides valuable genomic resources for future studies on the evolution of P. polysora.

Affordable de novo generation of fish mitogenomes using amplification-free enrichment...

Ana Ramon-Laca

and 2 more

June 09, 2022

Biomonitoring surveys from environmental DNA make use of metabarcoding tools to describe the community composition. These studies match their sequencing results against public genomic databases to identify the species. However, mitochondrial genomic reference data are yet incomplete, only a few genes may be available, or the suitability of existing sequence data is suboptimal for species level resolution. Here we present a dedicated and cost-effective workflow with no DNA amplification for generating complete fish mitogenomes for the purpose of strengthening fish mitochondrial databases. Two different long-fragment sequencing approaches using Oxford Nanopore sequencing coupled with mitochondrial DNA enrichment were used. One where the enrichment is achieved by preferential isolation of mitochondria followed by DNA extraction and nuclear DNA depletion (‘mitoenrichment’). A second enrichment approach takes advantage of the CRISPR-Cas9 targeted scission on previously dephosphorylated DNA (‘targeted mitosequencing’). The sequencing results varied between tissue, species, and integrity of the DNA. The mitoenrichment method yielded 0.17-12.33 % of sequences on target and a mean coverage ranging from 74.9-805-fold. The targeted mitosequencing experiment from native genomic DNA yielded 1.83-55 % of sequences on target and a 38-2123-fold mean coverage. This helped complete the mitogenome of species with homopolymeric regions, tandem repeats and gene rearrangements. We demonstrate that deep sequencing of long fragments of native fish DNA is possible, can be achieved with low computational resources in a cost-effective manner, exceeding the widespread genome skimming approach, and allowing the discovery of mitogenomes of non-model or understudied fish taxa to a broad range of laboratories worldwide.

CRABS -- A software program to generate curated reference databases for metabarcoding...

Gert-Jan Jeunen

and 5 more

June 01, 2022

The measurement of biodiversity is an integral aspect of life science research. With the establishment of second- and third-generation sequencing technologies, an increasing amount of metabarcoding data is being generated as we seek to describe the extent and patterns of biodiversity in multiple contexts. The reliability and accuracy of taxonomically assigning metabarcoding sequencing data has been shown to be critically influenced by the quality and completeness of reference databases. Custom, curated, eukaryotic reference databases, however, are scarce, as are the software programs for generating them. Here, we present CRABS (Creating Reference databases for Amplicon-Based Sequencing), a software package to create custom reference databases for metabarcoding studies. CRABS includes tools to download sequences from multiple online repositories (i.e., NCBI, BOLD, EMBL, MitoFish), retrieve amplicon regions through in silico PCR analysis and pairwise global alignments, curate the database through multiple filtering parameters (e.g., dereplication, sequence length, sequence quality, unresolved taxonomy), export the reference database in multiple formats for the immediate use in taxonomy assignment software, and investigate the reference database through implemented visualizations for diversity, primer efficiency, reference sequence length, and taxonomic resolution. CRABS is a versatile tool for generating curated reference databases of user-specified genetic markers to aid taxonomy assignment from metabarcoding sequencing data. CRABS is available for download as a conda package and via GitHub (https://github.com/gjeunen/reference_database_creator).

Fast-tracking bespoke DNA reference database generation from museum collections for b...

Andrew Dopheide

and 6 more

June 01, 2022

Despite recent advances in high-throughput DNA sequencing technologies, a lack of locally relevant DNA reference databases may limit the potential for DNA-based monitoring of biodiversity for conservation and biosecurity applications. Museums and national collections represent a compelling source of authoritatively identified genetic material for DNA database development yet obtaining DNA barcodes from long-stored specimens may be difficult due to sample degradation. We demonstrate a sensitive and efficient laboratory and bioinformatic process for generating DNA barcodes from hundreds of invertebrate specimens simultaneously via the Illumina MiSeq system. Using this process, we recovered full-length (334) or partial (105) COI barcodes from 439 of 450 (98 %) national collection-held invertebrate specimens. This included full-length barcodes from 146 specimens which produced low-yield DNA and no visible PCR bands, and which produced as little as a single sequence per specimen, demonstrating high sensitivity of the process. In many cases, the identity of the most abundant sequences per specimen were not the correct barcodes, necessitating the development of a taxonomy-informed process for identifying correct sequences among the sequencing output. The recovery of only partial barcodes for some taxa indicates a need to refine certain PCR primers. Nonetheless, our approach represents a highly sensitive, accurate, and efficient method for targeted reference database generation, providing a foundation for DNA-based assessments and monitoring of biodiversity.

The predator problem and PCR primers in molecular dietary analysis: swamped or silenc...

Jordan Cuff

and 5 more

May 18, 2022

Dietary metabarcoding has vastly improved our ability to analyse the diets of animals, but it is hampered by a plethora of technical limitations including potentially reduced data output due to the disproportionate amplification of the DNA of the focal predator, here termed ‘the predator problem’. We review the various methods commonly used to overcome this problem, from deeper sequencing to exclusion of predator DNA during PCR, and how they may interfere with increasingly common multi-predator-taxon studies. We suggest that multi-primer approaches with an emphasis on achieving both depth and breadth of prey detections may overcome the issue to some extent, although multi-taxon studies require further consideration, as highlighted by an empirical example. We also review several alternative methods for reducing the prevalence of predator DNA that are conceptually promising but require additional empirical examination. The predator problem is a key constraint on molecular dietary analyses but, through this synthesis, we hope to guide researchers in overcoming this in an effective and pragmatic way.

VCFPOP: performing population genetics analyses for polyploids and anisoploids based...

Kang Huang

and 8 more

May 09, 2022

Polyploids are cells or organisms with a genome consisting of more than two sets of homologous chromosomes. Polyploid plants have important traits that facilitate speciation and are thus often model systems for evolutionary, molecular ecology and agricultural studies. However, due to their unusual mode of inheritance and dou-ble-reduction, diploid models of population genetic analysis cannot properly be ap-plied to polyploids. To overcome this problem, we developed a software package en-titled VCFPOP to perform a variety of population genetic analyses for autopolyploids, such as parentage analysis, analysis of molecular variance, principal coordinates analysis, hierarchical clustering analysis and Bayesian clustering. We make this soft-ware freely available, downloadable from http://github.com/huangkang1987/vcfpop.

Shallow shotgun sequencing of the microbiome recapitulates 16S amplicon results and p...

Mason Stothart

and 2 more

April 08, 2022

Prevailing 16S rRNA gene-amplicon methods for characterizing the bacterial microbiome are economical, but result in coarse taxonomic classifications, are subject to primer and 16S copy number biases, and do not allow for direct estimation of microbiome functional potential. While deep shotgun metagenomic sequencing can overcome many of these limitations, it is prohibitively expensive for large sample sets. We evaluated the ability of shallow shotgun metagenomic sequencing to characterize taxonomic and functional patterns in the fecal microbiome of a model population of feral horses (Sable Island, Canada). Since 2007, this unmanaged population has been the subject of an individual-based, long-term ecological study. Using deep shotgun metagenomic sequencing, we determined the sequencing depth required to accurately characterize the horse microbiome. In comparing conventional versus high-throughput shotgun metagenomic library preparation techniques, we validate the use of more cost-effective lab methods. Finally, we characterize similarities between 16S amplicon and shallow shotgun characterization of the microbiome, and demonstrate that the latter recapitulates biological patterns first described in a published amplicon dataset. Unlike amplicon data, we demonstrate how shallow shotgun metagenomic data also provided useful insights about microbiome functional potential which support previously hypothesized diet effects in this study system.

Comparison of destructive and non-destructive DNA extraction methods for the metabarc...

Ameli Kirse

and 4 more

April 05, 2022

DNA metabarcoding is routinely used for biodiversity assessment, especially targeting highly diverse groups for which limited taxonomic expertise is available. Various protocols are currently in use, although standardization is key to its application in large-scale monitoring. DNA metabarcoding of arthropod bulk samples can be either conducted destructively from sample tissue, or non-destructively from sample fixative or lysis buffer. Non-destructive methods are highly desirable for the preservation of sample integrity but have yet to be experimentally evaluated in detail. Here, we compare diversity estimates from 14 size sorted Malaise trap samples processed consecutively with three non-destructive approaches (one using fixative ethanol and two using lysis buffers) and one destructive approach (using homogenized tissue). Extraction from commercial lysis buffer yielded comparable species richness and high overlap in species composition to the ground tissue extracts. A significantly divergent community was detected from preservative ethanol-based DNA extraction. No consistent trend in species richness was found with increasing incubation time in lysis buffer. These results indicate that non-destructive DNA extraction from incubation in lysis buffer could provide a comparable alternative to destructive approaches with the added advantage of preserving the specimens for post-metabarcoding taxonomic work.

Net overboard: comparing marine eDNA sampling methodologies at sea to unravel marine...

Ulla von Ammon

and 8 more

March 17, 2022

Environmental DNA (eDNA) analyses are powerful for describing marine biodiversity but must be optimized for their effective use in routine monitoring. To maximize eDNA detection probabilities of sparsely distributed populations, water samples are usually concentrated from larger volumes and filtered using fine-pore membranes, often a significant cost-time bottleneck in the workflow. This study aimed to streamline eDNA sampling by investigating plankton net versus bucket sampling, direct versus sequential filtration including self-preserving filters. Biodiversity was assessed using metabarcoding of the small ribosomal subunit (18S rRNA) and mitochondrial cytochrome c oxidase I (COI) genes. Multi-species detection probabilities were estimated for each workflow using a probabilistic occupancy modelling approach. Significant workflow-related differences in biodiversity metrics were reported. Highest amplicon sequence variant (ASV) richness was attained by the bucket sampling combined with self-preserving filters, comprising a large portion of micro-plankton. Less diversity but more metazoan taxa were captured in the net samples combined with 5 µm pore size filters. Pre-filtered 1.2 µm samples yielded few or no unique ASVs. The highest average (~32%) metazoan detection probabilities in the 5 µm pore size net samples confirmed the effectiveness of pre-concentrating plankton for biodiversity screening. These results contribute to streamlining eDNA sampling protocols for uptake and implementation in marine biodiversity research and surveillance.

Environmental RNA degrades more rapidly than environmental DNA across a broad range o...

Kaushar Kagzi

and 3 more

February 25, 2022

Although the use and development of molecular biomonitoring tools based on eNAs (environmental nucleic acids; eDNA and eRNA) have gained broad interest for the quantification of biodiversity in natural ecosystems, studies investigating the impact of site-specific physicochemical parameters on eNA-based detection methods (particularly eRNA) remain scarce. Here, we used a controlled laboratory microcosm experiment to comparatively assess the environmental degradation of eDNA and eRNA across an acid-base gradient following complete removal of the progenitor organism (Daphnia pulex). Using water samples collected over a 30-day period, eDNA and eRNA copy numbers were quantified using a droplet digital PCR (ddPCR) assay targeting the mitochondrial cytochrome c oxidase subunit I (COI) gene of D. pulex. We found that eRNA decayed more rapidly than eDNA at all pH conditions tested, with detectability—predicted by an exponential decay model—for up to 57 hours (eRNA; neutral pH) and 143 days (eDNA; acidic pH) post organismal removal. Decay rates for eDNA were significantly higher in neutral and alkaline conditions than in acidic conditions, while decay rates for eRNA did not differ significantly among pH levels. Collectively, our findings provide the basis for a predictive framework assessing the persistence and degradation dynamics of eRNA and eDNA across a range of ecologically relevant pH conditions, establish the potential for eRNA to be used in spatially and temporally sensitive biomonitoring studies (as it is detectable across a range of pH levels), and may be used to inform future sampling strategies in aquatic habitats.

Using gridCoal to assess whether standard population genetic theory holds in the pres...

Barbora Trubenova

and 2 more

February 24, 2022

Spatially explicit population genetic models have long been developed, yet have rarely been used to test hypotheses about the spatial distribution of genetic diversity or the expected neutral levels of genetic divergence between populations. Here, we use spatially explicit coalescence simulations to explore the properties of the island model and the two-dimensional stepping stone model under a wide range of scenarios with spatio-temporal variation in deme size. We avoid the simulation of genetic data, using the fact that under the studied models, summary statistics of genetic diversity and divergence between demes can be approximated from coalescence times. We perform the simulations using gridCoal, a flexible spatial wrapper for the software msprime developed herein. In gridCoal, deme sizes can change arbitrarily across space and time, and migration rates between individual demes can be specified. We identify the different factors that can cause a deviation from the theoretical expectations, such as the simulation time in comparison to the effective deme size and the spatio-temporal autocorrelation across the grid. Our results highlight that Fst, a measure of the strength of population structure, principally depends on recent demography, which makes it robust to temporal variation in deme size. We also warn that predicting genetic diversity from coalescence times requires a much longer run time than needed for the estimation of Fst. Finally, we illustrate the use of gridCoal on a real-world example, the range expansion of silver fir (Abies alba Mill.) since the Last Glacial Maximum, using different degrees of spatio-temporal variation in deme size.

Quantitative monitoring of diverse fish communities on a large scale combining eDNA m...

Didier Pont

and 14 more

January 24, 2022

eDNA metabarcoding is an effective method for studying fish communities but allows only an estimation of relative species abundance (density / biomass). Here, we combine metabarcoding with an estimation of the total abundance of eDNA amplified by our universal marker (teleo) using a qPCR approach to infer the absolute abundance of fish species. We carried out a 2850 km eDNA survey within the Danube catchment using a spatial integrative sampling protocol coupled with traditional electrofishing for fish biomass and density estimation. Total fish eDNA concentrations and total fish abundance were highly correlated. The correlation between eDNA concentrations per taxon and absolute specific abundance was of comparable strength when all sites were pooled and remained significant when the sites were considered separately. Furthermore, a non-linear mixed model showed that species richness was underestimated when the amount of teleo-DNA extracted from a sample was below a threshold of 0.65.106 copies of eDNA. This result, combined with the decrease in teleo-DNA concentration by several orders of magnitude with river size, highlights the need to increase sampling effort in large rivers. Our results show a comprehensive description of longitudinal changes in fish communities and underline our combined metabarcoding/qPCR approach for biomonitoring and bioassessment surveys when a rough estimate of absolute species abundance is sufficient.

High‐quality genomes reveal significant genetic divergence and cryptic speciation in...

Yun-Xia Luan

and 10 more

December 22, 2021

The collembolan Folsomia candida Willem, 1902, is an important representative soil arthropod that is widely distributed throughout the world and has been frequently used as a test organism in soil ecology and ecotoxicology studies. However, it is questioned as an ideal “standard” because of differences in reproductive modes and cryptic genetic diversity between strains from various geographical origins. In this study, we present two high-quality chromosome-level genomes of F. candida, for the parthenogenetic Danish strain (FCDK, 219.08 Mb, N50 of 38.47 Mb, 25,139 protein-coding genes) and the sexual Shanghai strain (FCSH, 153.09 Mb, N50 of 25.75 Mb, 21,609 protein-coding genes). The seven chromosomes of FCDK are each 25–54% larger than the corresponding chromosomes of FCSH, showing obvious repetitive element expansions and large-scale inversions and translocations but no whole-genome duplication. The strain-specific genes, expanded gene families and genes in nonsyntenic chromosomal regions identified in FCDK are highly related to its broader environmental adaptation. In addition, the overall sequence identity of the two mitogenomes is only 78.2%, and FCDK has fewer strain-specific microRNAs than FCSH. In conclusion, FCDK and FCSH have accumulated independent genetic changes and evolved into distinct species since diverging 10 Mya. Our work shows that F. candida represents a good model of rapidly cryptic speciation. Moreover, it provides important genomic resources for studying the mechanisms of species differentiation, soil arthropod adaptation to soil ecosystems, and Wolbachia-induced parthenogenesis as well as the evolution of Collembola, a pivotal phylogenetic clade between Crustacea and Insecta.

SNPfiltR: an R package for interactive and reproducible SNP filtering

Devon DeRaad

December 17, 2021

Here I describe the novel R package SNPfiltR and demonstrate its functionalities as the backbone of a customizable, reproducible SNP filtering pipeline implemented exclusively via the widely adopted R programming language. SNPfiltR extends existing SNP filtering functionalities by automating the visualization of key parameters such as depth, quality, and missing data, then allowing users to set filters based on optimized thresholds, all within a single, cohesive working environment. All SNPfiltR functions require a vcfR object as input, which can be easily generated by reading a SNP dataset stored as a standard vcf file into an R working environment using the function read.vcfR() from the R package vcfR. Performance benchmarking reveals that for moderately sized SNP datasets (up to 50M genotypes with associated quality information), SNPfiltR performs filtering with comparable efficiency to current state of the art command-line-based programs. These benchmarking results indicate that for most reduced-representation genomic datasets, SNPfiltR is an ideal choice for investigating, visualizing, and filtering SNPs as part of a cohesive and easily documentable bioinformatic pipeline. The SNPfiltR package can be downloaded from CRAN with the command [install.packages(“SNPfiltR”)], and a development version is available from GitHub at: (github.com/DevonDeRaad/SNPfiltR). Additionally, thorough documentation for SNPfiltR, including multiple comprehensive vignettes, is available at the website: (devonderaad.github.io/SNPfiltR/).

A target capture approach for phylogenomic analyses at multiple evolutionary timescal...

Simon Crameri

and 3 more

December 17, 2021

Understanding the genetic changes associated with the evolution of biological diversity is of fundamental interest to molecular ecologists. The assessment of genetic variation at hundreds or thousands of unlinked genetic loci forms a sound basis to address questions ranging from micro- to macro-evolutionary timescales, and is now possible thanks to advances in sequencing technology. Major difficulties are associated with i) the lack of genomic resources for many taxa, especially from tropical biodiversity hotspots, ii) scaling the numbers of individuals analyzed and loci sequenced, and iii) building tools for reproducible bioinformatic analyses of such datasets. To address these challenges, we developed a set of target capture probes for phylogenomic studies of the highly diverse, pantropically distributed and economically significant rosewoods (Dalbergia spp.), explored the performance of an overlapping probe set for target capture across the legume family (Fabaceae), and built a general-purpose bioinformatics pipeline. Phylogenomic analyses of Dalbergia species from Madagascar yielded highly resolved and well supported hypotheses of evolutionary relationships. Population genomic analyses identified differences between closely related species and revealed the existence of a potentially new species, suggesting that the diversity of Malagasy Dalbergia species has been underestimated. Analyses at the family level corroborated previous findings by the recovery of monophyletic subfamilies and many well-known clades, as well as high levels of gene tree discordance, especially near the root of the family. The new genomic and bioinformatics resources will hopefully advance systematics and ecological genetics research in legumes, and promote conservation of the highly diverse and endangered Dalbergia rosewoods.

Chromosome-level genome assembly reveals female-biased genes for sex determination an...

Xindong Xu

and 20 more

November 07, 2021

Schistosomiasis is a neglected tropical disease of humans caused by blood flukes of the genus Schistosoma – the only dioecious parasitic flatworms. Although aspects of sex determination, differentiation and reproduction have been studied in some Schistosoma species, almost nothing is understood for Schistosoma japonicum - the causative agent of schistosomiasis japonica. This relates mainly to a lack of high-quality genomic and transcriptomic resources for this species. As current draft genomes for S. japonicum are highly fragmented, we assembled here a chromosome-level reference genome (seven autosomes, the Z-chromosome and partial W-chromosome), achieving a substantially enhanced gene annotation. Utilising this genome, we discovered that the sex chromosomes of S. japonicum and its congener S. mansoni independently suppressed recombination during evolution, forming four and two ‘strata’, respectively. By exploring the W-chromosome and sex-specific transcriptomes, we identified 35 W-linked genes and 257 female-preferentially transcribed genes (FTGs) and identified a signature for sex determination and differentiation in S. japonicum. These FTGs cluster within autosomes or the Z-chromosome and exhibit a highly dynamic transcription profile during the pairing of female and male schistosomules (advanced juveniles), representing a critical phase for the maturation of the female worms, suggesting distinct layers of regulatory control of gene transcription at this stage of development. Collectively, these data provide a valuable resource for further functional genomic characterisation of S. japonicum, shed light on the evolution of sex chromosomes in this highly virulent human blood fluke and provide a pathway to identify novel targets for development of intervention tools against schistosomiasis.

Short- and long-read metabarcoding of the eukaryotic rRNA operon: evaluation of prime...

Meike Anna Christine Latz

and 7 more

October 28, 2021

High-throughput sequencing for analysis of environmental microbial diversity has evolved vastly over the last decade. Currently the go-to method for microbial eukaryotes is short-read metabarcoding of variable regions of the 18S rRNA gene with <500 bp amplicons. However, there is a growing interest in long-read sequencing of amplicons covering the rRNA operon for improving taxonomic resolution. For both methods, the choice of primers is crucial. It determines if community members are covered, if they can be identified at a satisfactory taxonomic level, and if the obtained community profile is representative. Here, we designed new primers targeting 18S and 28S rRNA based on 177,934 and 21,072 database sequences, respectively. The primers were evaluated in silico along with published primers on reference sequence databases and marine metagenomics datasets. We further evaluated a subset of the primers for short- and long-read sequencing on environmental samples in vitro and compared the obtained community profile with primer-unbiased metagenomic sequencing. Of the short-read pairs, a new V6-V8 pair and the V4_Balzano pair used with a simplified PCR protocol provided good results in silico and in vitro. Fewer differences were observed between the long-read primer pairs. The long-read amplicons and ITS1 alone provided higher taxonomic resolution than V4. Together, our results represent a reference and guide for selection of robust primers for research on and environmental monitoring of microbial eukaryotes.

Evaluating the accuracy of variant calling methods using the frequency of parent-offs...

Russ J. Jasper

and 6 more

October 11, 2021

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.