Fig. 3A : Identification rate on species level displayed for all included phyla and classes separately. 3B : Results of RF specimen identification to phylum and class level. Bars are divided into three categories relevant to the identification success. The darkest color displays the fraction of incorrect identifications, the intermediate color displays correct random forest identifications and the light color represents the percentage of specimens recognized as correct identifications by the post-hoc test.
Case study - cryptic species
In the present data, the identification of the starfishAstropecten irregularis (Pennant, 1777) specimens from the North Sea serves as an example for closely related species that are still distinguishable by proteomic fingerprinting. In a previous study, this morphotype was found to consist of two major genetic clades with inter-clade distances in COI of up to 12%. Morphological differences were not determined so far. Both groups show different distribution patterns with overlaps (Laakmann et al., 2016). Our data included specimens of both clades, A. irregularis 1 (n=8) and A. irregularis 2 (n=27).
Data processing settings were optimized for the sub-set of data (HWS = 9 and SNR = 8). Within a RF model produced from the data, a clear distinction between the two genetic groups was possible. None of the specimens was misassigned to the respective other group. This RF model was also used to find the most important variables for differentiation of the two groups using the Gini index, which shows the degree of dissimilarity of the respective variables (Han et al., 2016). The 30 most important variables are given in Fig. 4A. Whereas all peaks can be found in specimens of both groups, the intensities differ strongly allowing a clear differentiation of A. irregularis clades using proteome fingerprinting.