Species and matrix assembly
Raw reads were demultiplexed per individual and the software PHYLUCE v
1.6.8 (Faircloth, 2016) was used for each species assembly. Raw reads
were trimmed for adapter contamination and low-quality bases with
trimmomatic v 0.39 (Bolger, Lohse, & Usadel, 2014) implemented in
illumiprocessor v 2.0.9 (Faircloth, 2013). We then run cd-hit-dup v
4.6.4 for removing duplicates from sequencing reads (Fu et al., 2012).
We used the three assemblers: ABySS v 2.0 (Jackman et al., 2017),
Trinity v. 2.1.1 (Haas et al., 2013), and Velvet v 1.2.10 (Zerbino &
Birney, 2008) for comparative purposes (see Table S2). In order to
recover a higher number of contigs, we combined both Velvet and Trinity
assemblies and matched the contigs to the probe set with the modified
version of the original script by B. Fairclothphyluce_assembly_match_contigs_to_probes_duplicates with–min-coverage and –min-identity of 80.
Assembly QC statistics are shown in Table 3. Contigs that represent the
targeted UCE loci were captured and duplicated loci –either different
probes hitting the same locus or a probe hitting multiple loci– were
removed. A list of the enriched UCE loci in each taxon, including
incomplete loci (not found in all the taxon set), was generated and
individual FASTA files for these were extracted (Table 2; see Table S2
for summary statistics for each species using each of the three
assemblers). In this step, we also included the UCE loci captured for
all genomes and transcriptome assemblies used for the probe set design
and two transcriptomes from caenogastropods. Selected UCE loci were
aligned in MAFFT v 7.455 (Katoh & Standley, 2013). Edge and internal
trimming of the resulting alignments was conducted using Gblocks v 0.91
(Castresana, 2000; Talavera & Castresana, 2007), specifying mid-level
arguments, ideal for higher-level taxonomic ranking phylogenies, i.e.–b1 0.5, –b2 0.5, –b3 6, –b4 4.
A matrix was then built using a percentage of completeness of 50% (see
Table 3 for the number of genes per species in the final datasets).