Analysis on the Impact of Weather and Temperature on Citi Bike Rider 

PUI2017 Extra Credit Project proposal

  <Hao Xi, hx517, hx517>

My notebook is here.

https://github.com/hx517/PUI2017_hx517/blob/master/EC/PUI2017_EC_project_hx517.ipynb

Problem Description:

Citi Bike is a bike sharing program launched in New York City in May 2013 to provide New York residents and visitors with a convenient and eco-friendly way to travel. Once people have purchased their membership, they use the member key to unlock the Citi Bike that they can borrow anywhere and repossess them at their destination. Common objections to  a bike sharing program are weather - there is a lot of raining throughout the year, and it seems that the heat of summer and cold of winter are not suitable for cycling as well, which seems to severely reduce the utilization of  bike sharing program . I want to measure the size of this effect. VanderPlas  (2015) pointed out  it in Seattle's study of sharing bicycles that  people ride more on warm days. Inspired by his article, I hope to analyze more detailed about the effects in NYC. Besides the number of rides, I will consider in riders' age, riders' speed, memberships or not for this analysis. For the next steps, I will use pivot table to find the latitude and longitude data for the location of stations which is mostly affected by the weather.

Data:

1.Citi bike data. 
Since each Citi Bike rental site has GPS location information, data can be recorded during user rental and cycling. The Citi Bike website provides data on each user's ride, including the location and timing of the start and end of the rental, the duration of the ride, which can let me calculate the average speed for riders.
The feature I would like to use is:
a. Number of rides;
b. Rough speed(calculated from Start of location and End of location of every rides, and time cost.)
 
2. Weather data.
NOAA official website provides a record of historical weather, including the average temperature, precipitation and other elements.
I would like to use the feature below to identify which feature in prediction model:
a. Weather condition;
b. Humidity;
c. Average temperature;
d. Visibility.

Analysis:

Overall, I want to get the impact of weather and temperature on riding.
The weather data I used in my analysis contains precipitation and average temperature. The riding side includes the speed of ride, the age of the rider, whether the rider is a member or not.I would like to reject the null hypotheses  that there is no linear correlation between precipitation/temperature and  the speed of ride/ the age of the rider . Besides,   I also would like to reject the null hypotheses that  the impact of the  precipitation/temperature is same no matter what level the membership is.
Then I would like to use pivot table to find the latitude and longitude data for the location of stations which is mostly affected by the weather.

References:

[1] J VanderPlas . 2015.  Analyzing Pronto CycleShare Data with Python and Pandas
[2] NOAA official website
[3]citi bike data
[4]Instruction in PUI2017_fb55/HW3

Deliverable:

1.Two prediction model shows what are the main weather factors that influence number of rides and speed of rides.
2.A heat map of Citi bike rental stations which is affected most in bad weather.