Naets edited subsection_Hierarchical_Clustering_using_Ward__.tex  about 8 years ago

Commit id: aa79f6e2a1e7e910c5989434fa61d971e74711d1

deletions | additions      

       

\subsection{Hierarchical Clustering using Ward's method}  Hierarchical clustering is yet another unsupervised algorithm for determining the clustered groups, yet it uses a much simpler method, namely merging the two closest clusters, and updating the proximity matrix to reflect the proximity between the new cluster and the original clusters, until there is only one large cluster left. For the cars dataset, Ward's method (i.e. the method which minimizes the variance of the distance of the observations to the clusters centers) was used with an Euclidean metric as the distance measure between the observations and the clusters centers. The reason we chose using Ward's method is that, in addition to its desirable decision criterion, it has a tendency to make the distances between the clusters fusions increase exponentially with the fusions, hence easing the decision concerning the number of clusters at which the dendrogram should be cut off.   The argumentation concerning the metric choice, i.e. the Euclidien metric, resides in the fact that in the absence of previous expertise concerning a dataset and the topic it relates to, Euclidien distance is often seen to give satisfactory results, especially in an orthogonal data space.  A dendrogram showing the results of this clustering method can be seen in Figure \ref{DendrogramPCAData}.