Figure Legends
Figure 1.Schematic overview of the FLc-Capture sequencing method. a) Prepare shotgun genomic DNA library. b) Prepare full-length cDNA probes from high-quality RNA using SMARTer™ PCR cDNA Synthesis Kit (Clontech Inc.). The full-length cDNA probes contain both UTR and ORF regions. c) Hybridize the probes to the shotgun genomic library. Both coding and noncoding loci bind to the cDNA probes.
Figure 2.Details of the FLc-Capture data processing. a) Creating reference ORF and UTR sets by sequencing the cDNA probes. b) Extracting ORF (coding) sequences. First, FLc-Capture assembled reads into contigs, and identified exons from contigs using EXONERATE based on the reference ORF sets. The identified exons are then mapped onto the reference ORF sequences to make the orthologous ORF sequences, based on the position information resulted from the EXONERATE searches. c) Extracting UTR (noncoding) sequences. Based on the reference UTRs, FLc-Capture uses a mutual best-hit (MBH) strategy to identify orthologous UTR sequences and truncate sequences according to its optimal aligning region.
Figure 3. The results of extracting phylogenetic loci from 1×, 5×, 10× and 20× whole-genome sequencing data for four insect and two snake species. For each species, at each sequencing depth, the left bar depicts the recovery rate of genes classified as complete (blue) and fragmented (yellow); the right gray bar depicts the recovery rate of DNA sites. The genome size of each species is shown under the species name.
Figure 4.The results of extracting UTR and ORF sequences from the FLc-Capture data for the ingroup Colubridae species and the outgroup species. Bars show a) gene recovery rates b) nucleotide recovery rates.
Figure 5. a) Plots of linear regressions of capture specificity (on-target) across all samples in FLc-Capture experiments (blue = ORF, red = UTR), using the average pairwise distance from probe species as the independent variable. b) Relationship between the mean capture depth of each target and the abundance of its corresponding probe (blue = ORF, red = UTR).
Figure 6.Characteristics of the ORF and UTR data sets (blue = ORF, red = UTR). Boxplots show a) distribution of locus length (bp), b) distribution of GC content (among genes and among species), and c) distribution of evolutionary rates of loci (measured by the mean pairwise distance of each locus). d) Visualization of ML tree space using multidimensional scaling plot of 1,075 ORF gene trees (left) and 1,948 UTR gene trees (right); each dot represents a tree inferred from one gene. Distances between dots represent Robinson–Foulds distances between gene trees.
Figure 7. Phylogenetic relationships among the 25 Colubridae species (including the probe species Ptyas korros ) and 12 outgroup species inferred from the ORF (1,075 ORFs; ~817 K) and UTR (1,948 UTRs ~1,114 K) data sets. The trees are inferred with RAxML, and the two data sets produce identical phylogeny. Branch support values are indicated beside nodes in order of ORF ML bootstrap and UTR ML bootstrap from left to right. The filled circles represent ML bootstrap support ≥ 95% (both ORF and UTR). The three hotly debated nodes (A, B and C) within the Colubridae family are indicated by filled circles with letters. The bars right to the species name represents the integrity of the data set for each species (calculated by loci).