Species and matrix assembly
Raw reads were demultiplexed per individual and the software PHYLUCE v 1.6.8 (Faircloth, 2016) was used for each species assembly. Raw reads were trimmed for adapter contamination and low-quality bases with trimmomatic v 0.39 (Bolger, Lohse, & Usadel, 2014) implemented in illumiprocessor v 2.0.9 (Faircloth, 2013). We then run cd-hit-dup v 4.6.4 for removing duplicates from sequencing reads (Fu et al., 2012). We used the three assemblers: ABySS v 2.0 (Jackman et al., 2017), Trinity v. 2.1.1 (Haas et al., 2013), and Velvet v 1.2.10 (Zerbino & Birney, 2008) for comparative purposes (see Table S2). In order to recover a higher number of contigs, we combined both Velvet and Trinity assemblies and matched the contigs to the probe set with the modified version of the original script by B. Fairclothphyluce_assembly_match_contigs_to_probes_duplicates with–min-coverage and –min-identity of 80.
Assembly QC statistics are shown in Table 3. Contigs that represent the targeted UCE loci were captured and duplicated loci –either different probes hitting the same locus or a probe hitting multiple loci– were removed. A list of the enriched UCE loci in each taxon, including incomplete loci (not found in all the taxon set), was generated and individual FASTA files for these were extracted (Table 2; see Table S2 for summary statistics for each species using each of the three assemblers). In this step, we also included the UCE loci captured for all genomes and transcriptome assemblies used for the probe set design and two transcriptomes from caenogastropods. Selected UCE loci were aligned in MAFFT v 7.455 (Katoh & Standley, 2013). Edge and internal trimming of the resulting alignments was conducted using Gblocks v 0.91 (Castresana, 2000; Talavera & Castresana, 2007), specifying mid-level arguments, ideal for higher-level taxonomic ranking phylogenies, i.e.–b1 0.5, –b2 0.5, –b3 6, –b4 4. A matrix was then built using a percentage of completeness of 50% (see Table 3 for the number of genes per species in the final datasets).