Targeted sequencing is an increasingly popular Next Generation Sequencing (NGS) approach for studying populations, through focusing sequencing efforts on specific parts of the genome of a species of interest. Methodologies and tools for designing targeted baits are scarce but in high demand. Here, we present specific guidelines and considerations for designing capture sequencing experiments for population genetics for both neutral genomic regions and regions subject to selection. We describe the bait design process for three diverse fish species: Atlantic salmon, Atlantic cod and tiger shark, which was carried out in our research group, and provide an evaluation of the performance of our approach across both historical and modern samples. The workflow used for designing these three bait sets has been implemented in the R-package supeRbaits, which encompass our considerations and guidelines for bait design to benefit researchers and practitioners. The supeRbaits R package is user‐friendly and versatile. It is written in C++ and implemented in R. supeRbaits and its manual are available from Github: https://github.com/BelenJM/supeRbaits
Environmental (e)DNA methods have enabled rapid, sensitive, and specific inferences of taxa presence throughout diverse fields of ecological study. However, use of eDNA results for decision-making has been impeded by uncertainties associated with false positive tests putatively caused by contamination. Sporadic contamination is a process that is inconsistent across samples and systemic contamination occurs consistently over a group of samples. Here, we used empirical data and lab experiments to (1) estimate the sporadic contamination rate for each stage of a common, targeted eDNA workflow employing best practice quality control measures under simulated conditions of rare and common target DNA presence, (2) determine the rate at which negative controls (i.e., “blanks”) detect varying concentrations of systemic contamination, (3) estimate the effort that would be required to consistently detect sporadic and systemic contamination. Sporadic contamination rates were very low across all eDNA workflow steps, and, therefore, an intractably high number of negative controls (>100) would be required to determine occurrence of sporadic contamination with any certainty. Contrarily, detection of intentionally introduced systemic contamination was more consistent; therefore, very few negative controls (<5) would be needed to consistently alert to systemic contamination. These results have considerable implications to eDNA study design when resources for sample analyses are constrained.
Genetic monitoring using non-invasive samples provides a complement or alternative to traditional population monitoring methods. However, Next Generation Sequencing approaches to monitoring typically require high quality DNA and the use of non-invasive samples (e.g. scat) is often challenged by poor DNA quality and contamination by non-target species. One promising solution is a highly multiplexed sequencing approach called Genotyping-in-thousands by sequencing (GT-seq), which can enable cost-efficient genomics-based monitoring for populations based on non-invasively collected samples. Here, we develop and validate a GT-seq panel of 324 single nucleotide polymorphisms (SNPs) optimized for genotyping of polar bears based on DNA from non-invasively collected fecal samples. We demonstrate 1) successful GT-seq genotyping of DNA from a range of sample sources, including successful genotyping of 85.7% of non-invasively collected fecal samples determined to contain polar bear DNA, and 2) that we can reliably differentiate individuals, ascertain sex, assess relatedness, and resolve population structure of Canadian polar bear subpopulations based on a GT-seq panel of 324 SNPs. Our GT-seq data reveal similar spatial-genetic patterns as previous polar bear studies but at lesser cost per sample and using non-invasively collected samples, indicating the potential of this approach for population monitoring. This GT-seq panel provides the foundation for a non-invasive toolkit for polar bear monitoring and contribute to community-based programs – a framework which may serve as a model for wildlife management and contribute to conservation and policy for species worldwide.
Helminth diseases have long been a threat to the health of humans and animals. Roundworms are important organisms for studying parasitic mechanisms, disease transmission and prevention. The study of parasites in the giant panda is of importance for understanding how roundworms adapt to the host. Here, we report a high-quality chromosome-scale genome of Baylisascaris schroederi with a genome size of 253.60 Mb and 19,262 predicted protein-coding genes. We found that gene families related to epidermal chitin synthesis and environmental information processes in the roundworm genome have expanded significantly. Furthermore, we demonstrated unique genes involved in essential amino acid metabolism in the B. schroederi genome, inferred to be essential for the adaptation to the giant panda-specific diet. In addition, under different deworming pressures, we found that four resistance-related genes (glc-1, nrf-6, bre-4 and ced-7) were under strong positive selection in a captive population. Finally, 23 known drug targets and 47 potential drug target proteins (essential homologues linked to lethal phenotypes) were identified. The genome provides a unique reference for inferring the early evolution of roundworms and their adaptation to the host. Population genetic analysis and drug sensitivity prediction provide insights revealing the impact of deworming history on population genetic structure of importance for disease prevention.
With the rapid growth of the number of sequenced ancient genomes, there has been increasing interest in using this new information to study past and present adaptation. Such an additional temporal component has the promise of providing improved power for the estimation of natural selection. Over the last decade, statistical approaches for detection and quantification of natural selection from ancient DNA (aDNA) data have been developed. However, most of the existing methods do not allow us to estimate the timing of natural selection along with its strength, which is key to understanding the evolution and persistence of organismal diversity. Additionally, most methods ignore the fact that natural populations are almost always structured, which can result in overestimation of the effect of natural selection. To address these issues, we propose a novel Bayesian framework for the inference of natural selection and gene migration from aDNA data with Markov chain Monte Carlo techniques, co-estimating both timing and strength of natural selection and gene migration. Such an advance enables us to infer drivers of natural selection and gene migration by correlating genetic evolution with potential causes such as the changes in the ecological context in which an organism has evolved. The performance of our procedure is evaluated through extensive simulations, with its utility shown with an application to ancient chicken samples.
Genome sequencing methods and assembly tools have improved dramatically since the 2013 publication of draft genome assemblies for the mountain pine beetle, Dendroctonus ponderosae Hopkins (Coleoptera: Curculionidae). We conducted proximity ligation library sequencing and scaffolding to improve contiguity, and then used linkage mapping and recent bioinformatic tools for correction and further improvement. The new assemblies have dramatically improved contiguity and gaps compared to the originals: N50 values increased 26- to 36-fold, and the number of gaps were reduced by half. Ninety percent of the content of the assemblies is now contained in 12 and 11 scaffolds for the female and male assemblies, respectively. Based on linkage mapping information, the 12 largest scaffolds in both assemblies represent all 11 autosomal chromosomes and the neo-X chromosome. These assemblies now have nearly chromosome-sized scaffolds and will be instrumental for studying genomic architecture, chromosome evolution, population genomics, functional genomics, and adaptation in this and other pest insects. We also identified regions in two chromosomes, including the ancestral-X portion of the neo-X chromosome, with elevated differentiation between northern and southern Canadian populations.
The 15 species of small carnivorous marsupials that comprise the genus Antechinus exhibit semelparity, a rare life-history strategy where death occurs after one breeding season. Antechinus males, but not females, age rapidly (demonstrate organismal senescence) during the breeding season and show promise as new animal models of ageing. Some antechinus species are also threatened or endangered. Here, we report chromosome-level genomes of the yellow-footed antechinus Antechinus flavipes. The genome assembly has a total length of 3.2 Gb with a contig N50 of 51.8 Mb and a scaffold N50 of 636.7 Mb. We anchored and oriented 99.7% of the assembly on seven pseudochromosomes and found that repetitive DNA sequences occupy 51.8% of the genome. Draft genome assemblies of three related species in the subfamily Phascogalinae, two additional antechinus species (A. argentus and A. arktos) and the iteroparous sister species Murexia melanurus were also generated. Preliminary demographic analysis supports the hypothesis that climate change during the Pleistocene isolated species in Phascogalinae and shaped their population size. A transcriptomic profile across the A. flavipes breeding season allowed us to identify genes associated with aspects of the male die-off. The chromosome-level A. flavipes genome provides a steppingstone to understanding an enigmatic life-history strategy and a resource to assist the conservation of antechinuses.
Populus has a wide ecogeographical range spanning the Northern Hemisphere, and exhibits abundant distinct species and hybrids globally. Populus tomentosa Carr. is widely distributed and cultivated in the eastern region of Asia, where it plays multiple important roles in forestry, agriculture, conservation, and urban horticulture. Reference genomes are available for several Populus species, however, our goals were to produce a very high quality de novo, chromosome-level genome assembly in P. tomentosa genome that could serve as a reference for evolutionary and ecological studies of hybrid speciation. Here, combining long-read sequencing and Hi-C scaffolding, we present a high-quality, haplotype-resolved genome assembly. The genome size was 740.2 Mb, with a contig N50 size of 5.47 Mb and a scaffold N50 size of 46.68 Mb, consisting of 38 chromosomes, as expected with the known diploid chromosome number (2n=2x=38). A total of 59,124 protein-coding genes were identified. Phylogenomic analyses revealed that P. tomentosa is comprised of two distinct subgenomes, which we deomonstrate is likely to have resulted from hybridization between Populus adenopoda as the female parent and Populus alba var. pyramidalis as the male parent, approximately 3.93 Mya. Although highly colinear, significant structural variation was also found between the two subgenomes. Our study provides a valuable resource for ecological genetics and forest biotechnology.
Microbial diversity and community function are related, and can be highly specialized in different gut regions. The cloacal microbiome of Sceloporus virgatus provides antifungal protection to eggshells during oviposition – a specialized function that suggests a specialized microbial composition. Here, we describe the S. virgatus cloacal microbiome from tissue and swab samples, and compare it to tissue samples from the gastrointestinal (GI) tract and oviduct, adding to the growing body of evidence of microbiome localization in reptiles. We further assessed whether common methods of microbial sampling – cloacal swabs and feces – provide accurate representations of these microbial communities and whether feces might “seed” the cloacal microbiome or impact the accuracy of cloacal swab sampling. We found that different regions of the gut had unique microbial community structures. The cloacal community, in particular, showed extreme specialization averaging 99% Proteobacteria (Phylum) and 83% Enterobacteriacaea (Family). Cloacal swabs recovered communities similar to that of lower intestine and cloacal tissues, but fecal samples had much higher diversity and a distinct composition (62% Firmicutes and 39% Lachnospiraceae) relative to all gut regions. Finally, we found that feces and cloacal swabs recover different communities, but cloacal swabs may be contaminated with fecal matter if taken immediately after defecation. These results serve as a caution against the assumption that fecal samples provide an accurate representation of the gut, and that although cloacal swabs can reflect a portion of the lower GI tract microbiome, they may also result in a mixed community of gut and fecal microbes.
To associate specimens identified by molecular characters to other biological knowledge, we need reference sequences annotated by Linnaean taxonomy. In this paper, we 1) report the creation of a comprehensive reference library of DNA barcodes for the arthropods of an entire country (Finland), 2) publish this library, and 3) deliver a new identification tool based on this resource. The reference library contains mtDNA COI barcodes for 11,275 (43%) of 26,437 arthropod species known from Finland, including 10,811 (45%) of 23,956 insect species. To quantify the improvement in identification accuracy enabled by the current reference library, we ran 1,000 Finnish insect and spider species through the Barcode of Life Data system (BOLD) identification engine. Of these, 91% were correctly assigned to a unique species when compared to the new reference library alone, 85% were correctly identified when compared to BOLD with the new material included, and 75% with the new material excluded. To capitalize on this resource, we used the new reference material to train a probabilistic taxonomic assignment tool, FinPROTAX, scoring high success. For the full-length barcode region, the accuracy of taxonomic assignments at the level of classes, orders, families, subfamilies, tribes, genera, and species reached 99.9%, 99.9%, 99.8%, 99.7%, 99.4%, 96.8%, and 88.5%, respectively. The FinBOL arthropod reference library and FinPROTAX are available through the Finnish Biodiversity Information Facility (www.laji.fi). Overall, the FinBOL investment represents a massive capacity-transfer from the taxonomic community of Finland to all sectors of society.
Ark shells are commercially important clam species that inhabit in muddy sediments of shallow coasts in East Asia. For a long time, the lack of genome resources has hindered scientific research of ark shells. Here, we reported a high-quality chromosome-level genome assembly of Scapharca kagoshimensis, with an aim to unravel the molecular basis of heme biosynthesis, and develop genomic resources for genetic breeding and population genetics in ark shells. Nineteen scaffolds corresponding to 19 chromosomes were constructed from 938 contigs (contig N50=2.01 Mb) to produce a final high-quality assembly with a total length of 1.11 Gb and scaffold N50 around 60.64 Mb. The genome assembly represents 93.4% completeness via matching 303 eukaryota core conserved genes. A total of 24,908 protein-coding genes were predicted and 24,551 genes (98.56%) of which were functionally annotated. The enrichment analyses suggested that genes in heme biosynthesis pathways were expanded and positive selection of the hemoglobin genes was also found in the genome of S. kagoshimensis, which gives important insights into the molecular mechanisms and evolution of the heme biosynthesis in mollusca. The valuable genome assembly of S. kagoshimensis would provide a solid foundation for investigating the molecular mechanisms that underlie the diverse biological functions and evolutionary adaptations of S. kagoshimensis.
Metabarcoding of DNA extracted from community samples of whole organisms (whole organism community DNA, wocDNA) is increasingly being applied to terrestrial, marine and freshwater metazoan communities to provide rapid, accurate and high resolution data for novel molecular ecology research. The growth of this field has been accompanied by considerable development that builds on microbial metabarcoding methods to develop appropriate and efficient sampling and laboratory protocols for whole organism metazoan communities. However, considerably less attention has focused on ensuring bioinformatic methods are adapted and applied comprehensively in wocDNA metabarcoding. In this study we examined over 600 papers and identified 111 studies that performed COI metabarcoding of wocDNA. We then systematically reviewed the bioinformatic methods employed by these papers to identify the state-of-the-art. Our results show that the increasing use of wocDNA COI metabarcoding for metazoan diversity is characterised by a clear absence of bioinformatic harmonisation, and the temporal trends show little change in this situation. The reviewed literature showed (i) high heterogeneity across pipelines, tasks and tools used, (ii) limited or no adaptation of bioinformatic procedures to the nature of the COI fragment, and (iii) a worrying underreporting of tasks, software and parameters. Based upon these findings we propose a set of recommendations that we think the wocDNA metabarcoding community should consider to ensure that bioinformatic methods are appropriate, comprehensive and comparable. We believe that adhering to these recommendations will improve the long-term integrative potential of wocDNA COI metabarcoding for biodiversity science.
Metabarcoding of DNA extracted from environmental or bulk specimen samples is increasingly used to detect plant and animal taxa in basic and applied biodiversity research because of its targeted nature that allows sequencing of genetic markers from many samples in parallel. To achieve this, PCR amplification is carried out with primers designed to target a taxonomically informative marker within a taxonomic group, and sample-specific nucleotide identifiers are added to the amplicons prior to sequencing. This enables assignment of the sequences back to the samples they originated from. Nucleotide identifiers can be added during the metabarcoding PCR and/or during ‘library preparation’, i.e. when amplicons are prepared for sequencing. Different strategies to achieve this labelling exist. All have advantages, challenges and limitations, some of which can lead to misleading results, and in the worst case compromise the fidelity of the metabarcoding data. Given the range of questions addressed using metabarcoding, the importance of ensuring that data generation is robust and fit for purpose should be at the forefront of practitioners seeking to employ metabarcoding for biodiversity assessments. Here, we present an overview of the three main workflows for sample-specific labelling and library preparation in metabarcoding studies on Illumina sequencing platforms. Further, we distil the key considerations for researchers seeking to select an appropriate metabarcoding strategy for their specific study. Ultimately, by gaining insights into the consequences of different metabarcoding workflows, we hope to further consolidate the power of metabarcoding as a tool to assess biodiversity across a range of applications.
We present the chromosome-level genome assembly of Dysdera silvatica Schmidt, 1981, a nocturnal ground-dwelling spider endemic from the Canary Islands. The genus Dysdera has undergone a remarkable diversification in this archipelago mostly associated with shifts in the level of trophic specialization, becoming an excellent model to study the genomic drivers of adaptive radiations. The new assembly (1.37 Gb; and scaffold N50 of 174.2 Mb), was performed using the chromosome conformation capture scaffolding technique, represents a continuity improvement of more than 4,500 times with respect to the previous version. The seven largest scaffolds or pseudochromosomes cover 87% of the total assembly size and match consistently with the seven chromosomes of the karyotype of this species, including the characteristic large X chromosome. To illustrate the value of this new resource we performed a comprehensive analysis of the two major arthropod chemoreceptor gene families (i.e., gustatory and ionotropic receptors). We identified 545 chemoreceptor sequences distributed across all pseudochromosomes, with a notable underrepresentation in the X chromosome. At least 54% of them localize in 83 genomic clusters with a significantly lower evolutionary distances between them than the average of the family, suggesting a recent origin of many of them. This chromosome-level assembly is the first high-quality genome representative of the Synspermiata clade, and just the third among spiders, representing a new valuable resource to gain insights into the structure and organization of chelicerate genomes, including the role that structural variants, repetitive elements and large gene families played in the extraordinary biology of spiders.
Arctium lappa has a long medicinal and edible history with great economic importance. We combined Illumina and PacBio sequences to generate the first high-quality chromosome-level draft genome of A. lappa. The assembled genome is approximately 1.79 Gb with a N50 contig size of 6.88 Mb. Approximately 1.70 Gb (95.4%) of the contig sequences were anchored onto 18 chromosomes using Hi-C data; the scaffold N50 was improved to be 91.64 Mb. Furthermore, we obtained 1.12 Gb (68.46%) of repetitive sequences and 32,771 protein-coding genes; 616 positively selected candidate genes were identified. Additionally, we compared the transcriptomes of A. lappa roots at three different developmental stages and identified 8,943 differentially expressed genes (DEGs) in these tissues. Among candidate genes related to lignan biosynthesis, the following were found to be highly correlated with the accumulation of arctiin: 4-coumarate-CoA ligase (4CL), dirigent protein (DIR), and hydroxycinnamoyl transferase (HCT). These data can be utilized to identify genes related to A. lappa quality or provide a basis for molecular identification and comparative genomics among related species.
Dispersal abilities play a crucial role in shaping the extent of population genetic structure, with more mobile species being panmictic over large geographic ranges and less mobile ones organized in meta-populations exchanging migrants to different degrees. In turn, population structure directly influences the coalescence pattern of the sampled lineages, but the consequences on the estimated variation of the effective population size (Ne) over time obtained by means of unstructured demographic models remain poorly understood. However, this knowledge is crucial for biologically interpreting the observed Ne trajectory and further devising conservation strategies in endangered species. Here we investigated the demographic history of four shark species (Carharhinus melanopterus, Carharhinus limbatus, Carharhinus amblyrhynchos, Galeocerdo cuvier) with different degrees of endangered status and life history traits related to dispersal distributed in the Indo-Pacific and sampled off New Caledonia. We compared several evolutionary scenarios representing both structured (meta-population) and unstructured models and then inferred the Ne variation through time. By performing extensive coalescent simulations, we provided a general framework relating the underlying population structure and the observed Ne dynamics. On this basis, we concluded that the recent decline observed in three out of the four considered species when assuming unstructured demographic models can be explained by the presence of population structure. Furthermore, we also demonstrated the limits of the inferences based on the sole site frequency spectrum and warn that statistics based on linkage disequilibrium will be needed to exclude recent demographic events affecting meta-populations.
Metabarcoding of environmental DNA (eDNA) is now widely used to build diversity profiles from DNA that has been shed by species into the environment. There is substantial interest in the expansion of eDNA approaches for improved detection of terrestrial vertebrates using invertebrate-derived DNA (iDNA) in which hematophagous, sarcophagous, and coprophagous invertebrates sample vertebrate blood, carrion, or feces. Here, we use metabarcoding and multiple iDNA samplers (carrion flies, sandflies, and mosquitos) to profile gamma and alpha diversity in a dry, tropical forest in the southern Amazon. Our main objectives were to (1) compare diversity found with iDNA to camera trapping, which is the conventional method of vertebrate diversity surveillance and (2) compare each of the iDNA samplers to assess the effectiveness, efficiency, and potential biases associated with each sampler. Carrion flies were the most effective sampler, despite the least amount of sampling effort and the fewest number of individuals captured for metabarcoding, in describing vertebrate biodiversity followed by sandflies. Camera traps had the highest median species richness at the site-level but showed strong bias towards carnivore and ungulate species and missed much of the diversity described by iDNA methods. Mosquitos showed a strong feeding preference for humans as did sandflies for armadillos, thus presenting potential utility to further study related to host-vector interactions.
Biodiversity inventory remains limited in marine systems due to unbalanced access to the three ocean dimensions. The use of environmental DNA (eDNA) for metabarcoding allows fast and effective biodiversity inventory and is forecast as a future biodiversity research and biomonitoring tool. However, in poorly understood ecosystems, eDNA results remain difficult to interpret due to large gaps in reference databases and PCR bias limiting the detection of some major phyla. Here, we aimed to circumvent these limitations by avoiding PCR and recollecting larger DNA fragments to improve assignment of detected taxa through phylogenetic reconstruction. We applied capture by hybridization (CBH) to enrich DNA from deep-sea sediment samples and compared the results with those obtained through an up-to-date metabarcoding PCR-based approach (MTB). Originally developed for bacterial communities by targeting 16S rDNA, the CBH approach was applied to 18S rDNA to improve the detection of species forming benthic communities of eukaryotes, with particular focus on metazoans. The results confirmed the possibility of extending CBH to metazoans with two major advantages: i) CBH revealed a broader spectrum of prokaryotic, eukaryotic, and particularly metazoan diversity, and ii) CBH allowed much more robust phylogenetic reconstructions of full-length barcodes with up to 1900 base pairs. This is particularly important for taxa whose assignment is hampered by gaps in reference databases. This study provides a database and probes to apply 18S CBH to diverse marine systems, confirming this promising new tool to improve biodiversity assessments in data-poor ecosystems like those in the deep sea.