Authorea

Classifying chords from audio files

Chord identification from audio files is a difficult task that compounds the inexactness of pitch recognition and musical data collection into a seemingly more error-prone procedure but often relies on advanced algorithms that perform surprisingly well. Chapter \ref{computationchordextract} surveys existing techniques and their advantages and disadvantages, but this section will overview basic techniques used.

Machine learning algorithms are commonly used to classify chords from chroma features. Chroma features represent the analyzed intensity of each pitch class by compounding frequencies from different octaves into a single bin. Machine learning involves training models based on known data and then observing how well they perform on new, or test, data. In the context of music informatics, human-annotated or recognized chord progressions are referred to as ground-truth sets\cite{BurgoyneEtAl_2011_AnExpeGrouSet}, so chord identification algorithms are trained and tested against ground-truth data.

Common statistical machine-learning procedures on which to train a model are Hidden Markov models (HMM) and dynamic Bayesian networks (DBN). Both are graphs in which nodes are connected and edges correspond to probabilities that a certain transition will occur. HMMs and DBNs involve probabilities that can be learned through training data. These inferred probabilities can be used on unknown, or test, data to classify chords by finding the Viterbi paths, or most likely paths, through the data.

Chord classification techniques usually have a small finite alphabet of chord qualities. Some algorithms can only distinguish between major and minor and some studies verify that this is indeed the right route to go.