Time Series Analysis of Beijing Air Pollution<Chunqing Xu, cx495, cx495>Problem Description: Air pollution has always been a hot issue in Beijing and China. It is harmful for residents and even though the government has attempted to solve this severe problem with different methods, it still occurs. In this project, factors causing air pollution in Beijing would be detected from time series analysis. Finding what are the main factors not only reveal why it is so hard to reduce air pollution in China, but also help the government make better policies towards air pollution.Data: Beijing PM2.5 hourly data (2015)Resource: U.S. Department of State Air Quality Monitoring Programhttp://www.stateair.net/web/historical/1/1.htmlThis dataset provided by U.S. Department of State Air Quality Monitoring Program contains hourly PM2.5 value of Beijing, providing more than 8000 monitoring data in one year, which is beneficial for time series analysis. Useless columns would be dropped and the time information would be transferred into formal datetime data type for analysis. Analysis: Time series analysis containing rolling means and rolling standard deviations would be applied to this dataset. Hourly PM2.5 value will be explored in time. The amount of time meeting the air quality standard would be calculated. What's more, event detection may be used if there is a obvious sudden change in the time series plot.References: 1. Materials from PUI class2. Time series analysis with pandas http://earthpy.org/pandas-basics.htmlDeliverable: 1. A conclusion – what are the main factors of Beijing air pollution?2. Visualizations – time series plots.3. Suggestions for related government sectors and organizations to help reduce air pollution in Beijing.
AbstractThe Citi Bike project in New York City was launched in 2013 and has since seen growth in usage throughout the city. In this experiment, we want to explore the age distribution of male and female bikers to examine how the Citi Bike delivery system can be better designed to serve user needs and attract more customers. Data cleaning and manipulation were implemented in Python. A null hypothesis significance test was conducted with a one tail z-score test. By designing the experiment with rigorous scientific theories and reproducible mechanism, the result shows middle-aged men are less likely to ride a bike than middle-aged women.