Prayatul Matrix for Evaluating Clustering Algorithms: A Direct Comparison Approach

Anupam Biswas

doi:10.36227/techrxiv.21207464.v3

loading page

Prayatul Matrix for Evaluating Clustering Algorithms: A Direct Comparison Approach

Anupam Biswas

Abstract

Performance comparison of clustering algorithms are often done in terms of different confusion matrix based scores obtained on test datasets when ground truth is available. However, a dataset comprises several instances having different difficulty levels. Therefore, it is more logical to compare effectiveness of clustering algorithms on individual instances instead of comparing scores obtained for the entire dataset. In this paper, an alternative approach is proposed for direct comparison of clustering algorithms in terms of individual instances within the dataset. A direct comparison matrix called Prayatul Matrix is prepared, which accounts for comparative outcome of two clustering algorithms on different instances of a dataset. Five different performance measures are designed based on prayatul matrix. Theoretical analysis shows proposed measures satisfy five important properties such as scale invariance, data invariance, permutation invariance, monotonicity and continuity. Efficacy of the proposed approach as well as designed measures is analyzed empirically with four clustering algorithms on widely used standard datasets. Indications of proposed measures are compared with confusion matrix-based measures as well as other three permutation invariant measures. Results are evident that the newly designed measures are capable of giving some important insight about the clustering algorithms, which were impossible with the existing measures.