HTS data and phylogenetic analysis
The Illumina Miseq sequencing run generated a total number of 4,150,073 reads for all samples. After applied quality filters, dereplication and chimera removal processes we obtained 2,595,039 DNA sequences. The clustering process of sequences at 97% identity level resulted in a total of 15,234 OTUs for all samples. After removing singletons, OTUs with nonsense codons and not belonging to Bacillariophyta phylum we obtained 7834 OTUs. Finally, after removing OTUs with a normalized read value less than 0,005% we obtained 4707 OTUs. The number of OTUs per sample ranged between 390 and 980, with an average of 634 OTUs per sample. Taxonomic assignation of OTUs was positive for 3138 OTUs, which were assigned to 219 species and 90 genera. The number of genera and species per sample ranged between 31-56 and 58-107 respectively, with an average of 42 and 80 respectively. Twenty-two taxa were present in all molecular-analyzed samples, of which three species (Ulnaria acus ,Eunotia bilunaris and Achnanthidium minutissimum ) and two genera (Gomphonema sp. and Fragilaria sp.) were also present in the most morphologically-analyzed samples. A total of 1569 OTUs could not be assigned to R-Syst::diatom reference database and remained unclassified.
According to our phylogeny constructed with 708 reference sequences, we observed several sequences not placed correctly. This result was more noticeable in the phylogeny constructed with the same reference sequences and 3138 taxonomy-assigned OTUs, where some reference sequences were placed out of their corresponding taxonomy-assigned OTUs. Accession to rbcL sequences alignments and phylogenetic trees is detailed at Data accessibility section.