PUI2016 Citibike Project Summary
In this project we looked at whether on average older individuals (over 40 years old) used Citibikes for shorter trips than younger individuals(less than 40 years old). Using information on trip duration and rider age for the month of February 2015, we ran a Z-test test for the proportions grouped by trip duration, yielding at statistic of 26.09. In this case we will reject the null hypothesis and conclude that older individuals are more willing to take shorter trips.
We used the zip file on the Citibike's website corresponding to the month of February 2015. The data can be downloaded here: https://s3.amazonaws.com/tripdata/201502-citibike-tripdata.zip
The corresponding .csv file contained entries for the start and stop station location, trip duration, customer type, birth year and gender of each rider during the month. We extracted age by subtracting the birth year of subscribers from the then current year 2015, and dropping all entries except trip duration and age. We split the pandas dataframe into those over and under 40 to create 2 samples. Then we divided the trip duration into two categories as short trip(less than 10 mins) and long trip(more than 10 mins) (see Figure 1). At last we normalized the distribution(see Figure 2).