Ahmed Rezeq added IntroductionClustering_is_the_process_of__.html  over 8 years ago

Commit id: dfc16a7d99d0e84e8d1e2fae204a11cb59f9a75e

deletions | additions      

         

Introduction

Clustering is the process of finding groups of similar objects based on some similarity measures. Clustering techniques are successfully applied in many applications in biology, finance and marketing domains. Traditional clustering algorithms are not suitable for processing huge amounts of data with a large number of attributes which are collected in many industrial fields. This is reasoned by their high processing time overhead and quality limits.
Curse of dimensionality is a well-known problem in clustering high dimensional datasets. As the number of dataset dimensions increases, measuring distance between dataset objects becomes more meaningless. This is reasoned by increasing the number of dimensions causes points to spread out until they are nearly equidistant from each other which completely masks clusters. One of the problems that arises due to this curse is the probability that a cluster may only exist in some subset of data attributes. Another problem is that these subset of attributes may differ from one cluster to another.