Molecular Ecology Resources - Authorea

by author

by title

by keyword

For all audiences: Incorporating immature stages into standardised inventories of meg...

Marc Domènech

and 4 more

August 19, 2021

Because of their challenging taxonomy, arthropods are traditionally underrepresented in biological inventories and monitoring programs. However, arthropods are the largest component of biodiversity, and no assessment can be considered informative without including them. Arthropod immature stages are often discarded during sorting, despite frequently representing more than half of the collected individuals. To date, little effort has been devoted to characterising the impact of discarding non-adult specimens on our diversity estimates. Here, we use a metabarcoding approach to analyse spiders from white oak communities in the Iberian Peninsula collected with standardised protocols, to assess (1) the contribution of juvenile stages to local diversity estimates, and (2) their effect on the diversity patterns inferred across communities. We further investigate the ability of metabarcoding to inform on abundance. We obtained 363 and 331 species as adults and juveniles, respectively. Species represented only by juveniles represented an increase of 35% with respect to those identified from adults in the whole sampling. Differences in composition between communities were greatly reduced when immature stages were taken considered, especially across latitudes. Moreover, our results revealed that metabarcoding data are to a certain extent quantitative, but some sort of taxonomic conversion factor may be necessary to provide accurate informative estimates. Although our findings do not question the relevance of the information provided by adult-based inventories, they also reveal that juveniles provide a novel and relevant layer of knowledge that, especially in areas with marked seasonality, may influence our interpretations, providing more accurate information from standardised biological inventories.

Batch effects in population genomic studies with low-coverage whole genome sequencing...

Runyang Nicolas Lou

and 1 more

August 03, 2021

Over the past few decades, the rapid democratization of high-throughput sequencing and the growing emphasis on open science practices have resulted in an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining datasets to achieve unprecedented sample sizes, spatial coverage, or temporal replication in population genomic studies. However, a common concern is that non-biological differences between datasets may generate batch effects that can confound real biological patterns. Despite general awareness about the risk of batch effects, few studies have examined empirically how they manifest in real datasets, and it remains unclear what factors cause batch effects and how to best detect and mitigate their impact bioinformatically. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a “batch-effect-naive” bioinformatic pipeline, batch effects severely biased our genetic diversity estimates, population structure inference, and selection scan. We then demonstrate that these batch effects resulted from multiple technical differences between our datasets, including the sequencing instrument model/chemistry, read type, read length, DNA degradation level, and sequencing depth, but their impact can be detected and substantially mitigated with simple bioinformatic approaches. We conclude that combining datasets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.

pgHMA: Application of the Heteroduplex Mobility Assay Analysis in Phylogenetics and P...

Teng Li

and 3 more

July 26, 2021

The Heteroduplex mobility assay (HMA) has proven to be a robust tool for the detection of genetic variation. Here, we describe a simple and rapid application of the HMA by microfluidic capillary electrophoresis, for phylogenetics and population genetic analyses (pgHMA). We show how commonly applied techniques in phylogenetics and population genetics have equivalents with pgHMA: phylogenetic reconstruction with bootstrapping, skyline plots, and mismatch distribution analysis. We assess the performance and accuracy of pgHMA by comparing the results obtained against those obtained using standard methods of analyses applied to sequencing data. The resulting comparisons demonstrate that: (1) there is a significant linear relationship (R = 0.992) between heteroduplex mobility and genetic distance; (2) phylogenetic trees obtained by HMA and nucleotide sequences present nearly identical topologies; (3) clades with high pgHMA parametric bootstrap support also have high bootstrap support on nucleotide phylogenies; (4) skyline plots estimated from the UPGMA trees of HMA and Bayesian trees of nucleotide data reveal similar trends, especially for the median trend estimate of effective population size; and (5) optimized mismatch distributions of HMA are closely fitted to the mismatch distributions of nucleotide sequences. In summary, pgHMA is an easily-applied method for approximating phylogenetic diversity and population trends. KEYWORDS: bootstrap, heteroduplex mobility assay, mismatch distribution, phylogenetics, skyline plot

Labile sex chromosomes and a novel candidate sex-determination gene in the Australian...

Alexandra Pavlova

and 7 more

July 22, 2021

Sex-specific ecology has management implications, but rapid sex-chromosome turnover in fishes hinders development of markers to sex monomorphic species. Here, we use annotated genomes and reduced-representation sequencing data for two Australian percichthyids, the Macquarie perch Macquaria australasica and the golden perch M. ambigua, and whole genome resequencing data for 50 Macquarie perch of each sex, to detect sex-linked loci, identify a candidate sex-determining gene and develop an affordable sexing assay. In-silico pool-seq tests of 1,492,004 Macquarie perch SNP loci revealed that a 275-Kb scaffold, containing the transcription factor SOX1b gene, was enriched for gametologous loci. Within this scaffold, 22 loci were sex-linked in a predominantly XY system, with females being homozygous at all 22, and males being heterozygous at two or more. Seven XY-gametologous loci were within a 146-bp region. Being ~38 Kb upstream of SOX1b, it might act as an enhancer controlling SOX1b transcription in the bipotential gonad that drives gonad differentiation. A PCR-RFLP sexing assay, targeting one of the Y-linked SNPs, tested in 66 known-sex Macquarie perch and two individuals of each sex of three confamilial species, and amplicon sequencing of 400 bp encompassing the 146-bp region, revealed that the few sex-linked positions differ between species and between Macquarie perch populations. This indicates sex-chromosome lability in Percichthyidae, also supported by non-homologous scaffolds containing sex-linked loci for Macquarie- and golden perches. The resources developed here will facilitate genomic research in Percichthyidae. Sex-linked markers will be useful for determining genetic sex in some populations and studying sex chromosome turnover.

Fishing for DNA? Designing baits for population genetics in target enrichment experim...

Belen Jimenez Mena

and 10 more

July 12, 2021

Targeted sequencing is an increasingly popular Next Generation Sequencing (NGS) approach for studying populations, through focusing sequencing efforts on specific parts of the genome of a species of interest. Methodologies and tools for designing targeted baits are scarce but in high demand. Here, we present specific guidelines and considerations for designing capture sequencing experiments for population genetics for both neutral genomic regions and regions subject to selection. We describe the bait design process for three diverse fish species: Atlantic salmon, Atlantic cod and tiger shark, which was carried out in our research group, and provide an evaluation of the performance of our approach across both historical and modern samples. The workflow used for designing these three bait sets has been implemented in the R-package supeRbaits, which encompass our considerations and guidelines for bait design to benefit researchers and practitioners. The supeRbaits R package is user‐friendly and versatile. It is written in C++ and implemented in R. supeRbaits and its manual are available from Github: https://github.com/BelenJM/supeRbaits

Time to get real with qPCR controls: The frequency of sample contamination and the in...

Patrick Hutchins

and 2 more

July 06, 2021

Environmental (e)DNA methods have enabled rapid, sensitive, and specific inferences of taxa presence throughout diverse fields of ecological study. However, use of eDNA results for decision-making has been impeded by uncertainties associated with false positive tests putatively caused by contamination. Sporadic contamination is a process that is inconsistent across samples and systemic contamination occurs consistently over a group of samples. Here, we used empirical data and lab experiments to (1) estimate the sporadic contamination rate for each stage of a common, targeted eDNA workflow employing best practice quality control measures under simulated conditions of rare and common target DNA presence, (2) determine the rate at which negative controls (i.e., “blanks”) detect varying concentrations of systemic contamination, (3) estimate the effort that would be required to consistently detect sporadic and systemic contamination. Sporadic contamination rates were very low across all eDNA workflow steps, and, therefore, an intractably high number of negative controls (>100) would be required to determine occurrence of sporadic contamination with any certainty. Contrarily, detection of intentionally introduced systemic contamination was more consistent; therefore, very few negative controls (<5) would be needed to consistently alert to systemic contamination. These results have considerable implications to eDNA study design when resources for sample analyses are constrained.

Genotyping-in-thousands by sequencing (GT-seq) of non-invasive fecal and degraded sam...

Kristen Hayward

and 8 more

July 04, 2021

Genetic monitoring using non-invasive samples provides a complement or alternative to traditional population monitoring methods. However, Next Generation Sequencing approaches to monitoring typically require high quality DNA and the use of non-invasive samples (e.g. scat) is often challenged by poor DNA quality and contamination by non-target species. One promising solution is a highly multiplexed sequencing approach called Genotyping-in-thousands by sequencing (GT-seq), which can enable cost-efficient genomics-based monitoring for populations based on non-invasively collected samples. Here, we develop and validate a GT-seq panel of 324 single nucleotide polymorphisms (SNPs) optimized for genotyping of polar bears based on DNA from non-invasively collected fecal samples. We demonstrate 1) successful GT-seq genotyping of DNA from a range of sample sources, including successful genotyping of 85.7% of non-invasively collected fecal samples determined to contain polar bear DNA, and 2) that we can reliably differentiate individuals, ascertain sex, assess relatedness, and resolve population structure of Canadian polar bear subpopulations based on a GT-seq panel of 324 SNPs. Our GT-seq data reveal similar spatial-genetic patterns as previous polar bear studies but at lesser cost per sample and using non-invasively collected samples, indicating the potential of this approach for population monitoring. This GT-seq panel provides the foundation for a non-invasive toolkit for polar bear monitoring and contribute to community-based programs – a framework which may serve as a model for wildlife management and contribute to conservation and policy for species worldwide.

Chromosome-scale assembly and whole-genome sequencing of 266 giant panda roundworms p...

Lei Han

and 27 more

July 01, 2021

Helminth diseases have long been a threat to the health of humans and animals. Roundworms are important organisms for studying parasitic mechanisms, disease transmission and prevention. The study of parasites in the giant panda is of importance for understanding how roundworms adapt to the host. Here, we report a high-quality chromosome-scale genome of Baylisascaris schroederi with a genome size of 253.60 Mb and 19,262 predicted protein-coding genes. We found that gene families related to epidermal chitin synthesis and environmental information processes in the roundworm genome have expanded significantly. Furthermore, we demonstrated unique genes involved in essential amino acid metabolism in the B. schroederi genome, inferred to be essential for the adaptation to the giant panda-specific diet. In addition, under different deworming pressures, we found that four resistance-related genes (glc-1, nrf-6, bre-4 and ced-7) were under strong positive selection in a captive population. Finally, 23 known drug targets and 47 potential drug target proteins (essential homologues linked to lethal phenotypes) were identified. The genome provides a unique reference for inferring the early evolution of roundworms and their adaptation to the host. Population genetic analysis and drug sensitivity prediction provide insights revealing the impact of deworming history on population genetic structure of importance for disease prevention.

Inferring the timing and strength of natural selection and gene migration in the evol...

Wenyang Lyu

and 4 more

June 11, 2021

With the rapid growth of the number of sequenced ancient genomes, there has been increasing interest in using this new information to study past and present adaptation. Such an additional temporal component has the promise of providing improved power for the estimation of natural selection. Over the last decade, statistical approaches for detection and quantification of natural selection from ancient DNA (aDNA) data have been developed. However, most of the existing methods do not allow us to estimate the timing of natural selection along with its strength, which is key to understanding the evolution and persistence of organismal diversity. Additionally, most methods ignore the fact that natural populations are almost always structured, which can result in overestimation of the effect of natural selection. To address these issues, we propose a novel Bayesian framework for the inference of natural selection and gene migration from aDNA data with Markov chain Monte Carlo techniques, co-estimating both timing and strength of natural selection and gene migration. Such an advance enables us to infer drivers of natural selection and gene migration by correlating genetic evolution with potential causes such as the changes in the ecological context in which an organism has evolved. The performance of our procedure is evaluated through extensive simulations, with its utility shown with an application to ancient chicken samples.

Chromosome-level genome assembly reveals genomic architecture of northern range expan...

Christopher I. Keeling

and 7 more

June 11, 2021

Genome sequencing methods and assembly tools have improved dramatically since the 2013 publication of draft genome assemblies for the mountain pine beetle, Dendroctonus ponderosae Hopkins (Coleoptera: Curculionidae). We conducted proximity ligation library sequencing and scaffolding to improve contiguity, and then used linkage mapping and recent bioinformatic tools for correction and further improvement. The new assemblies have dramatically improved contiguity and gaps compared to the originals: N50 values increased 26- to 36-fold, and the number of gaps were reduced by half. Ninety percent of the content of the assemblies is now contained in 12 and 11 scaffolds for the female and male assemblies, respectively. Based on linkage mapping information, the 12 largest scaffolds in both assemblies represent all 11 autosomal chromosomes and the neo-X chromosome. These assemblies now have nearly chromosome-sized scaffolds and will be instrumental for studying genomic architecture, chromosome evolution, population genomics, functional genomics, and adaptation in this and other pest insects. We also identified regions in two chromosomes, including the ancestral-X portion of the neo-X chromosome, with elevated differentiation between northern and southern Canadian populations.

A chromosome-level genome of Antechinus flavipes provides a reference for an Australa...

Ran Tian

and 19 more

June 09, 2021

The 15 species of small carnivorous marsupials that comprise the genus Antechinus exhibit semelparity, a rare life-history strategy where death occurs after one breeding season. Antechinus males, but not females, age rapidly (demonstrate organismal senescence) during the breeding season and show promise as new animal models of ageing. Some antechinus species are also threatened or endangered. Here, we report chromosome-level genomes of the yellow-footed antechinus Antechinus flavipes. The genome assembly has a total length of 3.2 Gb with a contig N50 of 51.8 Mb and a scaffold N50 of 636.7 Mb. We anchored and oriented 99.7% of the assembly on seven pseudochromosomes and found that repetitive DNA sequences occupy 51.8% of the genome. Draft genome assemblies of three related species in the subfamily Phascogalinae, two additional antechinus species (A. argentus and A. arktos) and the iteroparous sister species Murexia melanurus were also generated. Preliminary demographic analysis supports the hypothesis that climate change during the Pleistocene isolated species in Phascogalinae and shaped their population size. A transcriptomic profile across the A. flavipes breeding season allowed us to identify genes associated with aspects of the male die-off. The chromosome-level A. flavipes genome provides a steppingstone to understanding an enigmatic life-history strategy and a resource to assist the conservation of antechinuses.

High quality haplotype-resolved genome assemblies of Populus tomentosa Carr., a stabi...

Xinmin An

and 26 more

June 09, 2021

Populus has a wide ecogeographical range spanning the Northern Hemisphere, and exhibits abundant distinct species and hybrids globally. Populus tomentosa Carr. is widely distributed and cultivated in the eastern region of Asia, where it plays multiple important roles in forestry, agriculture, conservation, and urban horticulture. Reference genomes are available for several Populus species, however, our goals were to produce a very high quality de novo, chromosome-level genome assembly in P. tomentosa genome that could serve as a reference for evolutionary and ecological studies of hybrid speciation. Here, combining long-read sequencing and Hi-C scaffolding, we present a high-quality, haplotype-resolved genome assembly. The genome size was 740.2 Mb, with a contig N50 size of 5.47 Mb and a scaffold N50 size of 46.68 Mb, consisting of 38 chromosomes, as expected with the known diploid chromosome number (2n=2x=38). A total of 59,124 protein-coding genes were identified. Phylogenomic analyses revealed that P. tomentosa is comprised of two distinct subgenomes, which we deomonstrate is likely to have resulted from hybridization between Populus adenopoda as the female parent and Populus alba var. pyramidalis as the male parent, approximately 3.93 Mya. Although highly colinear, significant structural variation was also found between the two subgenomes. Our study provides a valuable resource for ecological genetics and forest biotechnology.

Recovered microbiome of an oviparous lizard differs across gut and reproductive tissu...

Marie Bunker

and 2 more

June 07, 2021

Microbial diversity and community function are related, and can be highly specialized in different gut regions. The cloacal microbiome of Sceloporus virgatus provides antifungal protection to eggshells during oviposition – a specialized function that suggests a specialized microbial composition. Here, we describe the S. virgatus cloacal microbiome from tissue and swab samples, and compare it to tissue samples from the gastrointestinal (GI) tract and oviduct, adding to the growing body of evidence of microbiome localization in reptiles. We further assessed whether common methods of microbial sampling – cloacal swabs and feces – provide accurate representations of these microbial communities and whether feces might “seed” the cloacal microbiome or impact the accuracy of cloacal swab sampling. We found that different regions of the gut had unique microbial community structures. The cloacal community, in particular, showed extreme specialization averaging 99% Proteobacteria (Phylum) and 83% Enterobacteriacaea (Family). Cloacal swabs recovered communities similar to that of lower intestine and cloacal tissues, but fecal samples had much higher diversity and a distinct composition (62% Firmicutes and 39% Lachnospiraceae) relative to all gut regions. Finally, we found that feces and cloacal swabs recover different communities, but cloacal swabs may be contaminated with fecal matter if taken immediately after defecation. These results serve as a caution against the assumption that fecal samples provide an accurate representation of the gut, and that although cloacal swabs can reflect a portion of the lower GI tract microbiome, they may also result in a mixed community of gut and fecal microbes.

A molecular-based identification resource for the arthropods of Finland

Tomas Roslin

and 96 more

May 31, 2021

To associate specimens identified by molecular characters to other biological knowledge, we need reference sequences annotated by Linnaean taxonomy. In this paper, we 1) report the creation of a comprehensive reference library of DNA barcodes for the arthropods of an entire country (Finland), 2) publish this library, and 3) deliver a new identification tool based on this resource. The reference library contains mtDNA COI barcodes for 11,275 (43%) of 26,437 arthropod species known from Finland, including 10,811 (45%) of 23,956 insect species. To quantify the improvement in identification accuracy enabled by the current reference library, we ran 1,000 Finnish insect and spider species through the Barcode of Life Data system (BOLD) identification engine. Of these, 91% were correctly assigned to a unique species when compared to the new reference library alone, 85% were correctly identified when compared to BOLD with the new material included, and 75% with the new material excluded. To capitalize on this resource, we used the new reference material to train a probabilistic taxonomic assignment tool, FinPROTAX, scoring high success. For the full-length barcode region, the accuracy of taxonomic assignments at the level of classes, orders, families, subfamilies, tribes, genera, and species reached 99.9%, 99.9%, 99.8%, 99.7%, 99.4%, 96.8%, and 88.5%, respectively. The FinBOL arthropod reference library and FinPROTAX are available through the Finnish Biodiversity Information Facility (www.laji.fi). Overall, the FinBOL investment represents a massive capacity-transfer from the taxonomic community of Finland to all sectors of society.

Chromosome-level genome assembly of Scapharca kagoshimensis reveals the expanded mole...

Teng Weiming

and 15 more

May 21, 2021

Ark shells are commercially important clam species that inhabit in muddy sediments of shallow coasts in East Asia. For a long time, the lack of genome resources has hindered scientific research of ark shells. Here, we reported a high-quality chromosome-level genome assembly of Scapharca kagoshimensis, with an aim to unravel the molecular basis of heme biosynthesis, and develop genomic resources for genetic breeding and population genetics in ark shells. Nineteen scaffolds corresponding to 19 chromosomes were constructed from 938 contigs (contig N50=2.01 Mb) to produce a final high-quality assembly with a total length of 1.11 Gb and scaffold N50 around 60.64 Mb. The genome assembly represents 93.4% completeness via matching 303 eukaryota core conserved genes. A total of 24,908 protein-coding genes were predicted and 24,551 genes (98.56%) of which were functionally annotated. The enrichment analyses suggested that genes in heme biosynthesis pathways were expanded and positive selection of the hemoglobin genes was also found in the genome of S. kagoshimensis, which gives important insights into the molecular mechanisms and evolution of the heme biosynthesis in mollusca. The valuable genome assembly of S. kagoshimensis would provide a solid foundation for investigating the molecular mechanisms that underlie the diverse biological functions and evolutionary adaptations of S. kagoshimensis.

Coming of age for COI metabarcoding of whole organism community DNA: towards bioinfor...

Thomas Creedy

and 9 more

May 19, 2021

Metabarcoding of DNA extracted from community samples of whole organisms (whole organism community DNA, wocDNA) is increasingly being applied to terrestrial, marine and freshwater metazoan communities to provide rapid, accurate and high resolution data for novel molecular ecology research. The growth of this field has been accompanied by considerable development that builds on microbial metabarcoding methods to develop appropriate and efficient sampling and laboratory protocols for whole organism metazoan communities. However, considerably less attention has focused on ensuring bioinformatic methods are adapted and applied comprehensively in wocDNA metabarcoding. In this study we examined over 600 papers and identified 111 studies that performed COI metabarcoding of wocDNA. We then systematically reviewed the bioinformatic methods employed by these papers to identify the state-of-the-art. Our results show that the increasing use of wocDNA COI metabarcoding for metazoan diversity is characterised by a clear absence of bioinformatic harmonisation, and the temporal trends show little change in this situation. The reviewed literature showed (i) high heterogeneity across pipelines, tasks and tools used, (ii) limited or no adaptation of bioinformatic procedures to the nature of the COI fragment, and (iii) a worrying underreporting of tasks, software and parameters. Based upon these findings we propose a set of recommendations that we think the wocDNA metabarcoding community should consider to ensure that bioinformatic methods are appropriate, comprehensive and comparable. We believe that adhering to these recommendations will improve the long-term integrative potential of wocDNA COI metabarcoding for biodiversity science.

Strategies for sample labelling and library preparation in DNA metabarcoding studies

Kristine Bohmann

and 9 more

May 19, 2021

Metabarcoding of DNA extracted from environmental or bulk specimen samples is increasingly used to detect plant and animal taxa in basic and applied biodiversity research because of its targeted nature that allows sequencing of genetic markers from many samples in parallel. To achieve this, PCR amplification is carried out with primers designed to target a taxonomically informative marker within a taxonomic group, and sample-specific nucleotide identifiers are added to the amplicons prior to sequencing. This enables assignment of the sequences back to the samples they originated from. Nucleotide identifiers can be added during the metabarcoding PCR and/or during ‘library preparation’, i.e. when amplicons are prepared for sequencing. Different strategies to achieve this labelling exist. All have advantages, challenges and limitations, some of which can lead to misleading results, and in the worst case compromise the fidelity of the metabarcoding data. Given the range of questions addressed using metabarcoding, the importance of ensuring that data generation is robust and fit for purpose should be at the forefront of practitioners seeking to employ metabarcoding for biodiversity assessments. Here, we present an overview of the three main workflows for sample-specific labelling and library preparation in metabarcoding studies on Illumina sequencing platforms. Further, we distil the key considerations for researchers seeking to select an appropriate metabarcoding strategy for their specific study. Ultimately, by gaining insights into the consequences of different metabarcoding workflows, we hope to further consolidate the power of metabarcoding as a tool to assess biodiversity across a range of applications.

Chromosome-scale assembly of the Canary Island endemic spider Dysdera silvatica (Arac...

Paula Escuer

and 7 more

May 10, 2021

We present the chromosome-level genome assembly of Dysdera silvatica Schmidt, 1981, a nocturnal ground-dwelling spider endemic from the Canary Islands. The genus Dysdera has undergone a remarkable diversification in this archipelago mostly associated with shifts in the level of trophic specialization, becoming an excellent model to study the genomic drivers of adaptive radiations. The new assembly (1.37 Gb; and scaffold N50 of 174.2 Mb), was performed using the chromosome conformation capture scaffolding technique, represents a continuity improvement of more than 4,500 times with respect to the previous version. The seven largest scaffolds or pseudochromosomes cover 87% of the total assembly size and match consistently with the seven chromosomes of the karyotype of this species, including the characteristic large X chromosome. To illustrate the value of this new resource we performed a comprehensive analysis of the two major arthropod chemoreceptor gene families (i.e., gustatory and ionotropic receptors). We identified 545 chemoreceptor sequences distributed across all pseudochromosomes, with a notable underrepresentation in the X chromosome. At least 54% of them localize in 83 genomic clusters with a significantly lower evolutionary distances between them than the average of the family, suggesting a recent origin of many of them. This chromosome-level assembly is the first high-quality genome representative of the Synspermiata clade, and just the third among spiders, representing a new valuable resource to gain insights into the structure and organization of chelicerate genomes, including the role that structural variants, repetitive elements and large gene families played in the extraordinary biology of spiders.