PUI2016 Extra Credit Project

Time Series Analysis of Beijing Air Pollution
<Chunqing Xu, cx495, cx495>
Abstract:
This project is designed to find the main factors causing air pollution in Beijing. Time series analysis is applied to the dataset about PM2.5 of Beijing and other main cities of China. With related time series plots and event detection, we can tell that factory emissions, vehicle emissions and coal burning are three main factors of air pollution in Beijing.
Introduction:
The question to be answered is what are main factors of air pollution in Beijing. Air pollution is a severe problem in Beijing and China now, it is always vital since the air quality is strongly related to the health of residents. The government has made several policies trying to reduce the air pollution, the air pollution is reduced but far from being solved. Finding the main factors of air pollution may not only reveal why it is so hard to be eliminated but also help the government to make real effective policies.
Data:
The datasets used for this projects are hourly PM2.5 value of Beijing and some other main cities of China. They are provided by U.S. Department of State Air Quality Monitoring Program, which can be accessed at the official website: http://www.stateair.net/web/historical/1/1.html  But there are some errors in these datasets, some PM2.5 values are negative. Since PM2.5 is correctly defined as particulate matter with a mean aerodynamic diameter of 2.5 μm, so the negative PM2.5 values are wrong ones. Although this dataset contains only the information of PM2.5, analysis about air pollution would be persuasive if more information is contained for that PM2.5 is only one of the pollution particles. Negative values of PM2.5 Value are dropped and only the column of ‘Time’ and ‘PM2.5 Value’ are kept. Also, the values in ‘Time’ column are transferred into formal datetime data by using the function.
Table 1: Original dataset       
Table 2: Processed dataset        
Methodology:
Time series analysis is an appropriate method to analyze air pollution, since the index of air quality is continuous as time. Rolling means and rolling deviations are made to help realize the trend of PM2.5 Value in Beijing. Event detection is also used with the sudden declining trend of time series plot of PM2.5 Value. After searching about it, it is confirmed to be the military parade in September in Beijing, when hundreds of factories were shut down and 2.5 million cars were banned. We can conclude that factory emissions and vehicle emissions are main factors of air pollution in Beijing. With the comparison of north and south cities, we can also conclude that coal burning is another main factor.
Conclusions:
From time series plots PM2.5 value of 5 main cities of China, it can be concluded that factory emissions, vehicle emissions, and coal burning are there main factors of air pollution in Beijing. It is consistent with known factors of common air pollution, so the finding is reasonable.
Figure 1: PM2.5 Value of Beijing during 2015
Figure 2: Rolling means and Rolling Standrad Deviations of PM2.5 Value of Beijing during 2015
Figure 3: PM2.5 Value of 5 Main Cities of China during 2015
Future work:
To be confirmed, data about factory emissions, vehicle emissions and coal burning should be added to the analysis. Such as dataset of energy consumption and information of the industrial.
Also, since the government has released some policies to improve the air quality, evaluation of these policies would also be made.
Bibliography:
Bibliography:
1. Materials from PUI class
2. Time series analysis with pandas

[Someone else is editing this]

You are editing this file