Phylogenetic analysis
Near-identical sequences were clustered in order to generate Operational Taxonomic Units (OTUs) using the CLUSTER command of VSEARCH (Rognes et al., 2016) with an identity cut-off of 97%. Then, OTUs with only single read (singletons) were removed. Taxonomy was assigned to OTU list using the R-Syst::diatom v7.1 reference database (Rimet et al., 2016) and the BLASTn algorithm (Altschul et al., 1997) with a minimum identity value of 85%. Next, those DNA sequences that did not belong to diatom phylum (Bacillariophyta) were filtered out. All remaining OTU sequences were scanned by EMBOSS Getorf for Open Reading Frames (ORFs) with more than 251 bp (Rice, Longden, & Bleasby, 2000). Then, read values of resulting OTUs were normalized among samples to the most abundant value (82,446 reads). Finally, OTUs with a normalized read value less than 0.005 % were filtered out according with Bokulich et al. (2013).
For the purpose of obtain an overview of the taxonomic assigned OTUs we computed two phylogenetic trees. In order to save computational time, we build a reference phylogenetic tree using the 708 reference sequences that matched with our OTU inventory. In addition, we computed another phylogenetic tree using the same reference sequences and the 3138 taxonomy-assigned OTU sequences. We use MAFFT v7 program (Katoh and Standley, 2013) with the default settings to align all DNA sequences. The Maximum Likelihood (ML) phylogenies were constructed using the tool RAxML (Randomized Accelerated Maximum Likelihood) implemented on the CIPRES Portal (Miller, Pfeiffer, & Schwartz, 2010) using the GTRCATI (Generalized Time Reversible Model + optimization of substitution rates + optimization of site-specific evolutionary rates) as model of evolution and 1000 replicates for the bootstrap analysis. Phylogenetic trees were visualized with FigTree v1.4.3 (Rambaut, 2016).