Fig. 3A : Identification rate on species level displayed for all
included phyla and classes separately. 3B : Results of RF
specimen identification to phylum and class level. Bars are divided into
three categories relevant to the identification success. The darkest
color displays the fraction of incorrect identifications, the
intermediate color displays correct random forest identifications and
the light color represents the percentage of specimens recognized as
correct identifications by the post-hoc test.
Case study - cryptic species
In the present data, the identification of the starfishAstropecten irregularis (Pennant, 1777) specimens from the North
Sea serves as an example for closely related species that are still
distinguishable by proteomic fingerprinting. In a previous study, this
morphotype was found to consist of two major genetic clades with
inter-clade distances in COI of up to 12%. Morphological differences
were not determined so far. Both groups show different distribution
patterns with overlaps (Laakmann et al., 2016). Our data included
specimens of both clades, A. irregularis 1 (n=8) and A.
irregularis 2 (n=27).
Data processing settings were optimized for the sub-set of data (HWS = 9
and SNR = 8). Within a RF model produced from the data, a clear
distinction between the two genetic groups was possible. None of the
specimens was misassigned to the respective other group. This RF model
was also used to find the most important variables for differentiation
of the two groups using the Gini index, which shows the degree of
dissimilarity of the respective variables (Han et al., 2016). The 30
most important variables are given in Fig. 4A. Whereas all peaks can be
found in specimens of both groups, the intensities differ strongly
allowing a clear differentiation of A. irregularis clades using
proteome fingerprinting.