Prayatul Matrix for Evaluating Clustering Algorithms: A Direct
Comparison Approach
Abstract
Performance comparison of clustering algorithms are often done in terms
of different confusion matrix based scores obtained on test datasets
when ground truth is available. However, a dataset comprises several
instances having different difficulty levels. Therefore, it is more
logical to compare effectiveness of clustering algorithms on individual
instances instead of comparing scores obtained for the entire dataset.
In this paper, an alternative approach is proposed for direct comparison
of clustering algorithms in terms of individual instances within the
dataset. A direct comparison matrix called Prayatul Matrix is prepared,
which accounts for comparative outcome of two clustering algorithms on
different instances of a dataset. Five different performance measures
are designed based on prayatul matrix. Theoretical analysis shows
proposed measures satisfy five important properties such as scale
invariance, data invariance, permutation invariance, monotonicity and
continuity. Efficacy of the proposed approach as well as designed
measures is analyzed empirically with four clustering algorithms on
widely used standard datasets. Indications of proposed measures are
compared with confusion matrix-based measures as well as other three
permutation invariant measures. Results are evident that the newly
designed measures are capable of giving some important insight about the
clustering algorithms, which were impossible with the existing measures.