Pollinators are in decline thanks to the combined stresses of disease, pesticides, habitat loss, and climate. Honey bees face numerous pests and pathogens but arguably none are as devastating as Deformed wing virus (DWV). Understanding host-pathogen interactions and virulence of DWV in honey bees is slowed by the lack of cost-effective high-throughput screening methods for viral infection. Currently, analysis of virus infection in bees and their colonies is tedious, requiring a well-equipped molecular biology laboratory and the use of hazardous chemicals. Here we describe cDNA clones of DWV tagged with green fluorescent protein (GFP) or nanoluciferase (nLuc), providing high-throughput detection and quantification of virus infections. GFP fluorescence is recorded non-invasively in living bees via commonly available long-wave UV light sources and a smartphone camera or a standard ultraviolet transilluminator gel imaging system. Nonlethal monitoring with GFP allows high-throughput screening and serves as a direct breeding tool for identifying honey bee parents with increased antivirus resistance. Expression using the nLuc reporter strongly correlates with virus infection levels and is especially sensitive. Using multiple reporters, it is also possible to visualize competition, differential virulence, and host tissue targeting by co-occuring pathogens. Finally, it is possible to directly assess the risk of cross-species ‘spillover’ from honey bees to other pollinators and vice versa.
Metabarcoding is an important tool for understanding fungal communities. The internal transcribed spacer (ITS) rDNA is the accepted fungal barcode but has known problems. The large subunit (LSU) rDNA has also been used to investigate fungal communities but available LSU metabarcoding primers were mostly designed to target Dikarya (Ascomycota + Basidiomycota) with little attention to early diverging fungi (EDF). However, evidence from multiple studies suggests that EDF comprise a large portion of unknown diversity in community sampling. Here we investigate how DNA marker choice and methodological biases impact recovery of EDF from environmental samples. We focused on one EDF lineage, Zoopagomycota, as an example. We evaluated three primer sets (ITS1F/ITS2, LROR/LR3, and LR3 paired with new primer LR22F) to amplify and sequence a Zoopagomycota mock community and a set of 146 environmental samples with Illumina MiSeq. We compared two taxonomy assignment methods and created an LSU reference database compatible with AMPtk software. The two taxonomy assignment methods recovered strikingly different communities of fungi and EDF. Target fragment length variation exacerbated PCR amplification biases and influenced downstream taxonomic assignments, but this effect was greater for EDF than Dikarya. To improve identification of LSU amplicons we performed phylogenetic reconstruction and illustrate the advantages of this critical tool for investigating identified and unidentified sequences. Our results suggest much of the EDF community may be missed or misidentified with “standard” metabarcoding approaches and modified techniques are needed to understand the role of these taxa in a broader ecological context.
Phylogenetic trees have been extensively used in community ecology. However, how the phylogenetic reconstruction affects ecological inferences is poorly understood. In this study, we reconstructed three different types of phylogenetic trees (a synthetic-tree generated using VPhylomaker, a barcode-tree generated using rbcL+matK+trnH-psbA and a genome-tree generated from plastid genomes) that represented an increasing level of phylogenetic resolution among 580 woody plant species from six dynamic plots in subtropical evergreen broadleaved forests of China. We then evaluated the performance of each phylogeny in estimations of community phylogenetic structure, turnover and phylogenetic signal in functional traits. As expected, the genome-tree was most resolved and most supported for relationships among species. For local phylogenetic structure, the three trees showed consistent results with Faith’s PD and MPD; however, only the synthetic-tree produced significant clustering patterns using MNTD for some plots. For phylogenetic turnover, contrasting results between the molecular trees and the synthetic-tree occurred only with nearest neighbor distance. The barcode-tree agreed more with the genome-tree than the synthetic-tree for both phylogenetic structure and turnover. For functional traits, both the barcode-tree and genome-tree detected phylogenetic signal in maximum height, but only the genome-tree detected signal in leaf width. This is the first study that uses plastid genomes in large-scale community phylogenetics. Our results highlight the outperformance of genome-trees over barcode-trees and synthetic-trees for the analyses studied here. Our results also point to the possibility of Type I and II errors in estimation of phylogenetic structure and turnover and detection of phylogenetic signal when using synthetic-trees.
A high-quality reference genome is necessary to determine the molecular mechanisms underlying important biological phenomena; therefore, in the present study, a chromosome-level genome assembly of the Chinese shrimp Fenneropenaeus chinensis was performed. Muscle of a male shrimp was sequenced using PacBio platform, and assembled by Hi-C technology. The assembled F. chinensis genome was 1,465.32 Mb with contig N50 of 472.84 Kb, including 57.73% repetitive sequences, and was anchored to 43 pseudochromosomes, with scaffold N50 of 36.87 Mb. In total, 25,026 protein-coding genes were predicted. The genome size of F. chinensis showed significant contraction in comparison with that of other penaeid species, which is likely related to migration observed in this species. However, the F. chinensis genome included several expanded gene families related to cellular processes and metabolic processes, and the contracted gene families were associated with virus infection process. The findings signify the adaptation of F. chinensis to the selection pressure of migration and cold environment. Furthermore, the selection signature analysis identified genes associated with metabolism, phototransduction, and nervous system in cultured shrimps when compared with wild population, indicating targeted, artificial selection of growth, vision, and behavior during domestication. The construction of the genome of F. chinensis provided valuable information for the further genetic mechanism analysis of important biological processes, and will facilitate the research of genetic changes during evolution.
Here we present an annotated, chromosome-anchored, genome assembly for Lake Trout (Salvelinus namaycush) – a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long-read sequencing, paired-end Illumina sequencing, proximity ligation (Hi-C), and a previously published linkage map to produce a highly contiguous assembly composed of 7,378 contigs (contig N50 = 1.8 mb) assigned to 4,120 scaffolds (scaffold N50 = 44.975 mb). 84.7% of the genome was assigned to 42 chromosome-sized scaffolds and 93.2% of Benchmarking Universal Single Copy Orthologs were recovered, putting this assembly on par with the best currently available salmonid genomes. Estimates of genome size based on k-mer frequency analysis were highly similar to the total size of the finished genome, suggesting that the entirety of the genome was recovered. A mitome assembly was also produced. Self-vs-self synteny analysis allowed us to identify homeologs resulting from the Salmonid specific autotetraploid event (Ss4R) and alignment with three other salmonid species allowed us to identify homologous chromosomes in other species. We also generated multiple resources useful for future genomic research on Lake Trout including a repeat library and a sex averaged recombination map. A novel RNA sequencing dataset was also used to produce a publicly available set of gene annotations using the National Center for Biotechnology Information Eukaryotic Genome Annotation Pipeline. Potential applications of these resources to population genetics and the conservation of native populations are discussed.
The promotion of responsible and sustainable trade in biological resources is widely proposed as one solution to mitigate currently high levels of global biodiversity loss. Various molecular identification methods have been proposed as appropriate tools for monitoring global supply chains of commercialized animals and plants. We demonstrate the efficacy of target capture genomic barcoding in identifying and establishing the geographic origin of samples traded as Anacyclus pyrethrum, a medicinal plant assessed as globally vulnerable in the IUCN Red List. Samples collected from national and international supply chains were identified through target capture sequencing of 443 low-copy nuclear makers and compared to results derived from genome skimming of plastome, standard plastid barcoding regions and ITS. Both target capture and genome skimming provided approximately 3.4 million reads per sample, but target capture largely outperformed standard plant DNA barcodes and entire plastid genome sequences. Despite the difficulty of distinguishing among closely related species and infraspecific taxa of Anacyclus using conventional taxonomic methods, we succeeded in identifying 89 of 110 analysed samples to subspecies level without ambiguity through target capture. Furthermore, we were able to discern the geographical origin of Anacyclus samples collected in Moroccan, Indian and Sri Lankan markets, differentiating between plant materials originally harvested from diverse populations in Algeria and Morocco. With a recent drop in the cost of analysing samples, target capture offers the potential to routinely identify commercialized plant species and determine their geographic origin. It promises to play an important role in monitoring and regulation of plant species in trade, supporting biodiversity conservation efforts, and in ensuring that plant products are unadulterated, contributing to consumer protection.
Until recently many historical museum specimens were largely inaccessible to genomic inquiry, but high-throughput sequencing (HTS) approaches have allowed researchers to successfully sequence genomic DNA from dried and fluid-preserved museum specimens. In addition to preserved specimens, many museums contain large series of allozyme supernatant samples but the amenability of these samples to HTS has not yet been assessed. Here, we compared the performance of a target-capture approach using alternative sources of genomic DNA from ten specimens of spring salamanders (Plethodontidae: Gyrinophilus porphyriticus) collected 1985–1990: allozyme supernatants, allozyme homogenate pellets, and formalin-fixed tissues. We designed capture probes based on double-digest restriction-site associated (RADseq) sequencing derived loci from seven of the specimens and assessed the success and consistency of capture and RADseq technical replicates. This study design enabled direct comparisons of data quality and potential biases among the different datasets for phylogenomic and population genomic analyses. We found that in phylogenetic analyses, all replicates for a given specimen clustered together, but in principal component space, RADseq replicates did not cluster with corresponding capture-based replicates. SNP calls were on average 18.3% different between technical replicates, but these discrepancies were primarily due to differences in heterozygous/homozygous SNP calls. We demonstrate that both allozyme supernatant and formalin-fixed samples can be successfully used for population genomic analyses and we discuss ways to identify and reduce biases associated with combining capture and RADseq data.
The soybean cyst nematode (Heterodera glycines) is a sedentary plant parasite that exceeds a billion dollars in yield losses annually. It has spread across the soybean-producing world, emerging as the primary pathogen of soybeans. This problem is exacerbated by H. glycines populations overcoming the limited sources of natural resistance in soybean and by the lack of effective and safe alternative treatments. Although there are genetic determinants that render soybean plants resistant to certain nematode genotypes, resistant soybean cultivars are increasingly ineffective because their multi-year usage has selected for virulent H. glycines populations. Successful H. glycines infection relies on the comprehensive re-engineering of soybean root cells into a syncytium, as well as the long-term suppression of host defenses to ensure syncytial viability. At the forefront of these complex molecular interactions are effectors, the proteins secreted by H. glycines into host root tissues. The mechanisms that control genomic effector acquisition, diversification, and selection are important insights needed for the development of essential novel control strategies. As a foundation to obtain this understanding, we developed a nine scaffold, 158Mb pseudomolecule assembly of the H. glycines genome using PacBio, Chicago, and Hi-C sequencing. An annotation of 22,465 genes was predicted using a Mikado pipeline informed by published short- and long-read expression data. Here we present results from our assembly and annotation of the H. glycines genome.
[Definitive version of this article may be found here] The mitochondrial gene cytochrome-c-oxidase subunit 1 (COI) is useful in many taxa for phylogenetics, population genetics, metabarcoding, and rapid species identifications. However, the phylum Ctenophora (comb jellies) has historically been difficult to study due to divergent mitochondrial sequences and the corresponding inability to amplify COI with degenerate and standard COI ‘barcoding’ primers. As a result, there are very few COI sequences available for ctenophores, despite over 200 described species in the phylum. Here, we designed new primers and amplified the COI fragment from members of all major groups of ctenophores, including many undescribed species. Phylogenetic analyses of the resulting COI sequences revealed high diversity within many groups that was not evident from more conserved 18S rDNA sequences, in particular among the Lobata. The COI phylogenetic results also revealed unexpected community structure within the genus Bolinopsis, suggested new species within the genus Bathocyroe, and supported the ecological and morphological differences of some species such as Lampocteis cruentiventer and similar lobates (Lampocteis sp. ‘V’ stratified by depth, and ‘A’ differentiated by color). The newly described primers reported herein provide important tools to enable researchers to illuminate the diversity of ctenophores worldwide via quick molecular identifications, improve the ability to analyze environmental DNA by improving reference libraries and amplifications, and enable a new breadth of population genetic studies.
Many model organisms have obtained a prominent status due to an advantageous combination of their life-history characteristics, genetic properties and also practical considerations. In non-crop plants, Arabidopsis thaliana is the most renowned model and has been used as study system to elucidate numerous biological processes at the molecular level. Once a complete genome sequence was available, research has markedly accelerated and further established A. thaliana as the reference to stimulate studies in other species with different biology. Within the Brassicaceae family, the arctic-alpine perennial Arabis alpina has become a model complementary to A. thaliana to study life-history evolution and ecological genomics in harsh environments. In this review, we provide an overview of the properties that facilitated the rapid emergence of A. alpina as a plant model. We summarize the evolutionary history of A. alpina, including the diversification of its mating system, and discuss recent progress in the molecular dissection of developmental traits that are related to its perennial life history and environmental adaptation. We indicate open questions from which future research might be developed in other Brassicaceae species or more distantly related plant families.
Identifying local adaptation in bottlenecked species is essential for conservation management. Selection detection methods have an important role in species management plans, assessments of adaptive capacity, and looking for responses to climate change. Yet, the allele frequency changes exploited in selection detection methods are similar to those caused by the strong neutral genetic drift expected during a bottleneck. Consequently, it is often unclear what accuracy selection detection methods have across bottlenecked populations. In this study, simulations were used to explore if signals of selection could be confidently distinguished from genetic drift across 23 bottlenecked and reintroduced populations of Alpine ibex (Capra ibex). The meticulously recorded demographic history of the Alpine ibex was used to generate comprehensive simulated SNP data. The simulated SNPs were then used to benchmark the confidence we could place in outliers identified in empirical Alpine ibex SNP data. Within the simulated dataset, the false positive rates were high for all selection detection methods but fell substantially when two or more methods were combined. True positive rates were consistently low and became negligible with increased stringency. Despite finding many outlier loci in the empirical Alpine ibex SNPs, none could be distinguished from genetic drift-driven false positives. Unfortunately, the low true positive rate also prevents the exclusion of recent local adaptation within the Alpine ibex. The baselines and stringent approach outlined here should be applied to other bottlenecked species to ensure the risk of false positive, or negative, signals of selection are accounted for in conservation management plans.
Mapping the genes underlying ecologically-relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterisation of a species' genetic diversity across the landscape or even over its whole range. The relevant capture of the genetic diversity across the landscape is critical for a successful genetic mapping of traits and there are no clear guidelines on how to achieve an optimal sampling and which sequencing strategy to implement. Here we determine through simulation, the sampling scheme that maximises the power to map the genetic basis of a complex trait in an outbreeding species across an idealised landscape and draw genomic predictions for the trait, comparing individual and pool sequencing strategies. Our results show that QTL detection power and prediction accuracy are higher when more populations over the landscape are sampled and this is more cost-effectively done with pool sequencing than with individual sequencing. Additionally, we recommend sampling populations from areas of high genetic diversity. As progress in sequencing enables the integration of trait-based functional ecology into landscape genomics studies, these findings will guide study designs allowing direct measures of genetic effects in natural populations across the environment.
DNA metabarcoding is an important tool for molecular ecology. However, its effectiveness hinges on the quality of reference sequence databases and classification parameters employed. Here we evaluate the performance of MiFish 12S taxonomic assignments using a case study of California Current Large Marine Ecosystem fishes to determine best practices for metabarcoding. Specifically, we use a taxonomy cross-validation by identity framework to compare classification performance between a global database comprised of all available sequences and a curated database that only includes sequences of fishes from the California Current Large Marine Ecosystem. We demonstrate that the curated, regional database provides higher assignment accuracy than the comprehensive global database. We also document a tradeoff between accuracy and misclassification across a range of taxonomic cutoff scores, highlighting the importance of parameter selection for taxonomic classification. Furthermore, we compared assignment accuracy with and without the inclusion of additionally generated reference sequences. To this end, we sequenced tissue from 605 species using the MiFish 12S primers, adding 253 species to GenBank’s existing 550 California Current Large Marine Ecosystem fish sequences. We then compared species and reads identified from seawater environmental DNA samples using global databases with and without our generated references, and the regional database. The addition of new references allowed for the identification of 16 native taxa and 17.0% of total reads from eDNA samples, including species with vast ecological and economic value. Together these results demonstrate the importance of comprehensive and curated reference databases for effective metabarcoding and the need for locus-specific validation efforts.
Current knowledge on environmental distribution and taxon richness of free-living bacteria is mainly based on cultivation-independent investigations employing 16S rRNA gene sequencing methods. Yet, 16S rRNA genes are evolutionarily rather conserved, resulting in limited taxonomic and ecological resolutions provided by this marker. We used a faster evolving protein-encoding marker to reveal ecological patterns hidden within a single OTU defined by >99% 16S rRNA sequence similarity. The studied taxon, subcluster PnecC of the genus Polynucleobacter, represents a ubiquitous group of planktonic freshwater bacteria with cosmopolitan distribution, which is very frequently detected by diversity surveys of freshwater systems. Based on genome taxonomy and a large set of genome sequences, a sequence similarity threshold for delineation of species-like taxa could be established. In total, 600 species-like taxa were detected in 99 freshwater habitats scattered across three regions representing a latitudinal range of 3400 km (42°N to 71°N) and a pH gradient of 4.2 to 8.6. Besides the unexpectedly high richness, the increased taxonomic resolution revealed structuring of Polynucleobacter communities by a couple of macroecological trends, which was previously only demonstrated for phylogenetically much broader groups of bacteria. A unexpected pattern was the almost complete compositional separation of Polynucleobacter communities of Ca2+-rich and Ca2+-poor habitats, which strongly resembled the vicariance of plant species on silicate and limestone soils. The presented new cultivation-independent approach opened a window to an incredible, previously unseen diversity, and enables investigations aiming on deeper understanding of how environmental conditions shape bacterial communities and drive evolution of free-living bacteria.
Scale insects are hemimetabolous, showing “incomplete” metamorphosis and no true pupal stage. Ericerus pela, commonly known as the white wax scale insect (hereafter, WWS), is a wax-producing insect found in Asia and Europe. WWS displays dramatic sexual dimorphism, with notably different metamorphic fates in males and females. Males develop into winged adults, while females are neotenic and maintain a nymph-like appearance, which are flightless and remain stationary. Here we report the de novo assembly of the WWS genome with its size of 638.30 Mb (69.68Mb for scaffold N50) by PacBio sequencing and Hi-C. From these data, we constructed a robust phylogenetic analysis of 24,923 gene families from 16 representative insect genomes, which indicates that holometabola evolved from incomplete metamorphosis insects in the Late Carboniferous, about 50 million years earlier than previously thought. To study the distinct development of males and females, we analyzed the methylome landscape in either sex. Surprisingly, WWS displayed high levels of methylation (4.42% for males) when compared to other insects. We observed differential methylation patterns for genes involved in steroid and sesquiterpenoids production as well as related fatty acid metabolism pathways. We show here that both males and females exhibit distinct titer profiles for ecdysone, the principal insect steroid hormone, and juvenile hormone (a sesquiterpenoid), suggesting that these hormones are the primary drivers of sexually dimorphic features. Our results provide a comprehensive genomic and epigenomic resource of scale insects that provide new insights into the evolution of metamorphosis and sexual dimorphism in insects.
Managing endangered species in fragmented landscapes requires estimating dispersal rates between populations over contemporary timescales. Here we develop a new method for quantifying recent dispersal using genetic pedigree data for close and distant kin. Specifically, we describe an approach that infers missing shared ancestors between pairs of kin in habitat patches across a fragmented landscape. We then apply a stepping-stone model to assign unsampled individuals in the pedigree to probable locations based on minimizing the number of movements required to produce the observed locations in sampled kin pairs. Finally, we use all pairs of reconstructed parent-offspring sets to estimate dispersal rates between habitat patches under a Bayesian model. Our approach measures connectivity over the timescale represented by the small number of generations contained within the pedigree and so is appropriate for estimating the impacts of recent habitat changes due to human activity. We used our method to estimate recent movement between newly discovered populations of threatened Eastern Massasauga Rattlesnakes (Sistrurus catenatus) using data from 2996 RAD-based genetic loci. Our pedigree analyses found no evidence for contemporary connectivity between five genetic groups, but, as validation of our approach, showed high dispersal rates between sample sites within a single genetic cluster. We conclude that these five genetic clusters of Eastern Massasauga Rattlesnakes have small numbers of resident snakes and are demographically isolated conservation units. More broadly, our methodology can be widely applied to determine contemporary connectivity rates, independent of bias from shared genetic similarity due to ancestry that impacts other approaches.
The hyper-diverse order Coleoptera comprises a staggering ~25% of known species on Earth. Despite recent breakthroughs in next generation sequencing, there remains a limited representation of beetle diversity in assembled genomes. Most notably, the ground beetle family Carabidae, comprising more than 40,000 described species, has not been studied in a comparative genomics framework using whole genome data. Here we generate a high-quality genome assembly for Nebria riversi, to examine sources of novelty in the genome evolution of beetles, as well as genetic changes associated with specialization to high elevation alpine habitats. In particular, this genome resource provides a foundation for expanding comparative molecular research into mechanisms of insect cold adaptation. Comparison to other beetles shows a strong signature of genome compaction, with N. riversi possessing a relatively small genome (~147 Mb) compared to other beetles, with associated reductions in repeat element content and intron length. Small genome size is not, however, associated with fewer protein-coding genes, and an analysis of gene family diversity shows significant expansions of genes associated with cellular membranes and membrane transport, as well as protein phosphorylation and muscle filament structure. Finally, our genomic analyses show that these high elevation beetles have endosymbiotic Spiroplasma, with several metabolic pathways (e.g. propanoate biosynthesis) that might complement N. riversi, although its role as a beneficial symbiont or as a reproductive parasite remains equivocal.
We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.