Concept

Conduct an analysis of how news coverage on a particular topic ranges by city and in time. This project would focus on using the Natural Language Tool Kit (NLTK) within python to compare various news sources (online newspapers) regarding similarity and differences of word choice, tone, and vocabulary used in describing one current event topic, for example bitcoin across time, for example two sets of months. The idea is to compare these texts and determine if different cities have a different perspective on a current events topic. Bitcoin is just one example of a recent topic that can be tracked in popularity across time as well in regards to how authors/news sources in different cities present the idea. Given its’ rise in popularity in recent years with prices for bitcoins rising ever steadily to what some might call unimaginable highs (possible bubble) the aim of this project would seek to identify how different cities might view the same event from different perspectives. 

Intended data sources

Data sources might include publicly (or through NYU’s library databases) available news sources from website for example the Los Angeles Times, the New York Post, the Chicago Tribune, and the Washington Post.  The idea behind selecting these newspapers is that they are not specialized to the financial sector like the Wall Street Journal, or Barron’s, and are not considered national newspapers like the New York Times.

Intended methodology

After collecting corpus for each city, python libraries like nltk and matplotlib would be utilized to count the frequency of specific words and display them in a graph by city. Comparing relative frequencies of special keywords for example in a visualization of a matplotlib plot would allow the reader / researcher to identify where coverage of a particular subject -bitcoin as an example- may be more favorable or less favorable. Another approach might be to the NetworkX library in python to create a graph of certain topics within articles categorized by cities to determine clusters of similar topics and compare these graphs across cities to how coverage differs based on the nodes in the graph. 

Deliverables 

Deliverables will comprise of a notebook with python code, as well as visualizations embedded in a paper analyzing coverage patterns by city across time.