 Exploring the Relationship between the Hour of Day and CitiBike Ridership

•  Kevin Han
•  Yue Cai
•  Xiaomeng Dong
•  Cheng Hou

Abstract

In this analysis, we explore whether if there is a difference between the number of CitiBike rides during the rush hours of New York City and during non-rush hours. We define the rush hours of New York City to be the hours between 7 to 9 A.M. and 4 to 9 P.M during business days. We state our hypothesis and test it using a two-sided t-test. The test indicates that there is indeed a difference.

Data

We get our data from the $$CitiBike$$ $$website$$. We select a random sample from each month’s data in 2015. We extract the $$\color[rgb]{1,0,0}startime$$ variable from the raw data and convert it into a $$\color[rgb]{1,0,0}date$$ object in Python. Using the $$\color[rgb]{1,0,0}starttime$$ variable, we create a $$\color[rgb]{1,0,0}HourOfDay$$ variable by extracting the hour information from $$\color[rgb]{1,0,0}starttime$$. We remove data that take place in weekends and holidays. Then we dissolve the dataset by $$\color[rgb]{1,0,0}HourOfDay$$, and create a new variable, $$\color[rgb]{1,0,0}Count$$, that holds the total number of ridership by hour. A detailed iPython notebook showing the code we used to wrangle the data can be found $$here$$. Total Number of Rides from our randomly selected sample for each Hour of the day in 2015

Analysis

We state our null hypothesis as follows. At a significance level of $$\alpha=0.05$$, the average number of rides during the rush hours of New York City as we defined above is equal to the average number of rides during the non-rush hours of New York City, i.e.,

\begin{equation} H_{0}:\mu_{R}-\mu_{N}=0\nonumber \\ \end{equation} \begin{equation} H_{a}:\mu_{R}-\mu_{N}\neq 0\nonumber \\ \end{equation}

Under this null hypothesis, we perform a two-sided t-test using the $$\color[rgb]{0,0,1}ttest\_ind$$ function from the $$\color[rgb]{0,0,1}scipy.stats$$ package by providing two $$\color[rgb]{0,0,1}numpy$$ arrays with the total number of rides by hour in rush hours and non-rush hours respectively. We arrive at the following conclusion.

Because the p-value is smaller than the significance level, we reject the null hypothesis and accept the alternative hypothesis that the average number of riders during the rush hours of New York City as we defined above is not equal to the average number of rides during the non-rush hours of New York City.

Conclusion

We randomly chose $$1\%$$ from the CitiBikes data in twelve months of year 2015 as sample. From the results of our hypothesis test, we conclude that there is a notable difference between the average number of rides during the rush hours of New York City and the average number of riders during the non-rush hours of New York City.