CitiBike Mini-Project Report (CUSP2016_PUI)

Zhaohong Niu (zn352)


In this mini project of CitiBike data analysis, I wanted to know whether younger and elder people differ in the amount of time they spend riding CitiBikes. The idea was that the average duration of citibike trip for young people is longer than the elder.
_Null Hypothesis and Alternative Hypothesis_

Extracting one-month of CitiBike’s open data as a sample, I compared average trip duration in different age groups with a 5-year interval, before finalizing the age breakdown between younger and elder people. As the analysis suggested, the null hypothesis could not be rejected. The t-test result is smaller than the distribution table suggests with a df > 20, and p-value is less than alpha = 0.05. There's no significant difference between younger and elder people's trip duration of citibike usage.

Data Source and Data Wrangling

The data is CitiBike usage date in January, 2015, from CitiBike’s open database.
It shows trip duration (as in seconds), birth year as well as other information of each user.

Data Wrangling Process: 
(1)   Create a column transforming year of birth to age to better analyze the data
(2)   Drop NaN values if any
(3)   Remove rows with age that are older than 90 in case of too many outliers
(4)   Remove rows with trip duration equal to zero in case of outliers

After these steps, the data is all set and ready for further analysis.


First I plotted No. of CitiBike Riders by age group to observe sample size. Then I did the plot of average CitiBike trip duration by age. The plot only shows slight difference between all ages: