Dylan Freedman edited CImplementation.tex  about 9 years ago

Commit id: 3c3e394200c26d064d8093bdccdec33d281823bd

deletions | additions      

       

\section{Dataset Collection}  \subsubsection{Scraping}  \textit{Scraping} refers to the task of writing scripts to automatically extract files and download them. What follows is a brief overview of how I collected, or scraped, the datasets I am using in my evaluations.  \subsubsection{File \subsection{File  conversion} To convert collected files---which came in a variety of audio formats such as \textit{mp3}, \textit{aac}, and \textit{ogg}, and video formats such as \textit{mp4}---to \textit{wav} files required using the external library \textit{ffmpeg}. Ffmpeg contains a suite of \textit{codecs}, or computer audio wrappers, suitable for converting a wide range of file types to other file types. To feed data into Chordino requires the use of \textit{wav} files, so all input audio files were converted to \textit{wav} for the purpose of chord extraction. I wrote a wrapper to convert files and analyze their chordal content with Chordino in Python.  \subsection{YouTube Extraction}  The Beatles Dataset, consisting of over 180 songs of The Beatles' entire discography, as well as the Billboard 2014 dataset comprising over 200 top billboard charts in the United States were collected from YouTube\footnote{https://www.youtube.com/}. Using the Python extension \textit{youtube_dl}\footnote{https://pypi.python.org/pypi/youtube_dl}, \textit{youtube\_dl}\footnote{https://pypi.python.org/pypi/youtube\_dl},  I was able to bulk download videos from external user-created playlists corresponding to the data I wanted to extract. After downloading these files in \textit{mp4} format, I converted them to \textit{wav} files and extracted their chord progressions. \subsection{Rhapsody} 

\subsection{National Anthems}  The National Anthems Dataset was collected from the Wikipedia page "List of national anthems"\footnote{http://en.wikipedia.org/wiki/List_of_national_anthems} anthems"\footnote{http://en.wikipedia.org/wiki/List\_of\_national\_anthems}  which collects open source recordings of national anthems for every country recognized by the United Nations and a few other states and territories not officially recognized. Using Python to parse this web page, I downloaded all the audio files, which were in the \textit{ogg} format, an open-source alternative to mp3.