Clustering

One natural experiment after obtaining data corresponding to all the pairwise combinations of songs in a dataset, is to attempt clustering. I used the Markov Cluster Algorithm (MCL)¹ with custom-tuned parameters to attempt to find groups of Billboard 2014 songs that were connected well with one another. After some experimentation with parameters, the largest cluster result returned contained:
{ Call Me Maybe by Carly Rae Jepsen, Heart Attack by Demi Lovato, Whistle by FloRida, Cruise by Florida Georgia Line, Stereo Hearts ft. Adam Levine by Gym Class Heroes, The Edge of Glory by Lady Gaga, Live While We’re Young by One Direction, Try by P!nk, and If I Die Young by The Band Perry}
all songs that were manually verified to have similar chordal content. One problem with MCL is that not all songs with that progression were included and different sets of parameters resulted in drastically different lists of songs. Outside of the first song, there was almost always a steep decline in cluster size to 1 or 2 elements. In future experimentation I would like to try k-means clustering and try to derive a notion of harmonic uniqueness.

Global optimization comparing ground-truth and extracted chord data

Overview

The McGill dataset contains ground-truth annotations for which I was able to find the corresponding audio file and analyze extracted chord annotations using Chordino for 529 of the songs. This allows comparison of extracted chord progressions with their corresponding ground-truth annotations. Let \(D({McGill}_{g})\) correspond to the ground-truth annotation dataset and \(D({McGill}_{e})\) correspond to the extracted annotation dataset.

The goal, and primary result of this paper, is a means of comparing the performance of \(D({McGill}_{g})\) with \(D({McGill}_{e})\). The next few sections detail my approach.

Ranking Fully Connected Pairwise Comparisons

Ranking of a sequence

Recall the of a sequence is a mapping of every element of the sequence to its position in the sequence. For instance, \(ranking([5,2,3]) = [3,1,2]\), since 5 is the 3rd element of the list in order, 2 is the 1st element, and 3 is the 2nd element. In the case of ties, half numbers are used at the midpoint of the ties, such that \(ranking([1,2,3,3,4])=[1,2,3.5,3.5,5]\) and \(ranking([1,2,3,3,3,4])=[1,2,4,4,4,6]\).

Metrics for evaluating the similarity between two rankings

To compare two identically sized ranked sequences \(s1\) and \(s2\) of length \(n\), there are two primary metrics used. Spearman’s rho\cite{diaconis1977spearman} (\(\rho\)) and Kendall’s tau (\(\tau\)) where \[\rho(s1,s2) = {1 - \frac {6 \sum_i^n (s1_i - s2_i)^2}{n(n^2 - 1)}}\] and \[\tau(s1,s2) = \frac{(\text{number of identically ranked pairs}) - (\text{number of non-identically matched pairs})}{\frac{1}{2} n (n-1)}\]

Essentially, \(\rho\) takes into account the distance away from the correct ranking a discordant pair is, while \(\tau\) factors in the number of inversions needed to correctly swap the sequence back into place. Both return values from -1.0 to 1.0, with 1.0 indicating a perfect positive correlation, -1.0 a perfect negative correlation, and 0.0 no correlation.

Applying ranking metrics to the McGill datasets

Let \(rank(D,v)\) return the \(ranking\) of all the pairwise comparisons in the dataset \(D\) for a given set of variables \(v\). The ground-truth McGill dataset, \(D({McGill}_{g})\), and the extracted McGill dataset, \(D({McGill}_{e})\) can be compared by calculating \[\rho(rank(D({McGill}_{g}),v), rank(D({McGill}_{e}),v))\] or \[\tau(rank(D({McGill}_{g}),v), rank(D({McGill}_{e}),v))\] for a given \(v\).

The results of this comparison indicate whether there is a monotonic relationship between the two sets of data; that is, how the rankings of all the pairwise comparisons of all the songs in the datasets compare, and how strong the correlation is. Higher \(\rho\) and \(\tau\) values mean that the Smith-Waterman evaluation compensates well for errors in chord extraction compared to a ground-truth dataset.

Applying the Basin-Hopping algorithm

The basin-hopping algorithm\cite{wales1997global} is a global optimization technique that can be used to stochastically, or randomly, search through values of \(v\), the variables inputted into the Smith-Waterman algorithm, searching based on temperature, or rate of change of random steps, and increasing step sizes. Basin-hopping gets its name since it search local minima, or basins, and then “hops” out and searches elsewhere. It is proven to be effective for picking out satisfactory choices of many unknown variables for complex computations, perfect for optimizing the ranking correlation coefficients.

I ran a Python program that performs basin-hopping over 1,000 iterations for the ranges of the variables, calculating \[\rho(rank(D({McGill}_{g}),v), rank(D({McGill}_{e}),v))\] for each result and attempting to establish a set of variables with maximal rank. Each iteration took over 51 seconds to run, resulting in a total runtime of 13.5 hours. Though this computation time is expensive it would be a factor of at least 100x slower without the optimized SIMD Smith-Waterman implementation in C.

Results

The set of variables that optimized the rank correlation \(\rho\) was \[\{norm=\textit{raw score}, C_d=Harte, {gap}_{open}=0, {gap}_{ext}=95, m_x=5, m_s=0\}\] with a \(\rho\) score of 0.761642, which corresponds with a p-value of \(1.0232365505\times 10^{-26317}\), or probability that there is not such a correlation. This result is extremely solid and shows a strong monotonic relationship. The \(\tau\) score of this set of variables is \(0.589671\) with a p-value of \(1.5104 \times 10^{-23577}\). These results are charted in figures \ref{fig:0-0-0-95-5-0} and \ref{fig:testgraph}.

http://micans.org/mcl/↩