Association rule
The matrix usually contains large amount of data, therefore data mining
techniques are used to extract useful knowledge. We followed the
association rule proposed by Agrawal et al (1993).
Association rule is intended to capture a certain type of dependence
among species represented in the database. The rule is defined as an
implication of the form G1->G2, for example, an association
rule between species in the form of G1->G 2 which means
species 1 is also very likely to be observed with species 2 to form an
association {G1, G2}.
The significance of the association rule is measured via support and
confidence. The support of rule G1->G2 is the percentage of
G1 and G2 occurring together. Confidence of rule G1->G2 is
merely an estimate of the conditional probability of G2 given G1. If the
confidence of rule G1->G2 is 1 that means G1 occurs in a
particular site then G2 should occur in that site, too.
First, the binary phytoplankton data for identifying phytoplankton
associations were constructed (Table 1), “S” represents the sampling
site or time series, “G” represents algae species. Secondly, the
support of phytoplankton association was calculated. For instance, the
association {G1, G3} has 18% support because the species G1 and G3
occurs together in 2 of the 11 (Table 2). Finally, we calculated the
confidence of each phytoplankton association (Table 3). For example, the
confidence of the association {G1, G3} is 0.5 because species 3 occurs
at half of times that also containing species
1.
We identified the phytoplankton associations based on both
support>=50% and confidence>=0.8.
All analyses were performed using R software (R Development Core Team
2013). Specifically, we used the R package ‘arules’ for the affinity
analysis, ‘vegan’ for detrended correspondence analysis (DCA) and
redundancy analysis(RDA) and ‘packfor’ for forward selection analysis
(Oksanen et al., 2013; Hahsler et al., 2014; Dray et al., 2013).