Alex Robillard

and 7 more

1. Given the sharp increase in agricultural and infrastructure development and the paucity of widespread data available for making conservation management decisions, a more rapid and accurate tool for identifying fish fauna in the world’s largest freshwater ecosystem, the Amazon, is needed. 2. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. 3. To overcome these challenges, we built an image masking model (U-Net) and a convolutional neural net (CNN) to classify Amazonian fish in photographs. Fish used as training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. 4. Species identifications in the training images (n = 3,068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian’s National Museum of Natural History. 5. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities and community scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly.

Xiling Deng

and 8 more

Whole-genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing coverage to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing coverage on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing coverage (3.5x, 7.5x and 12x) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three coverages × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise and genome-wide FST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing coverage lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low coverage. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.