In this analysis, we explore whether if there is a difference between the number of CitiBike rides during the rush hours of New York City and during non-rush hours. We define the rush hours of New York City to be the hours between 7 to 9 A.M. and 4 to 9 P.M during business days. We state our hypothesis and test it using a two-sided t-test. The test indicates that there is indeed a difference.

We get our data from the \(CitiBike\) \(website\). We select a random sample from each month’s data in 2015. We extract the \(\color[rgb]{1,0,0}startime\) variable from the raw data and convert it into a \(\color[rgb]{1,0,0}date\) object in Python. Using the \(\color[rgb]{1,0,0}starttime\) variable, we create a \(\color[rgb]{1,0,0}HourOfDay\) variable by extracting the hour information from \(\color[rgb]{1,0,0}starttime\). We remove data that take place in weekends and holidays. Then we dissolve the dataset by \(\color[rgb]{1,0,0}HourOfDay\), and create a new variable, \(\color[rgb]{1,0,0}Count\), that holds the total number of ridership by hour. A detailed iPython notebook showing the code we used to wrangle the data can be found \(here\).

We state our null hypothesis as follows. At a significance level of \(\alpha=0.05\), the average number of rides during the rush hours of New York City as we defined above is equal to the average number of rides during the non-rush hours of New York City, i.e.,

\begin{equation} H_{0}:\mu_{R}-\mu_{N}=0\nonumber \\ \end{equation} \begin{equation} H_{a}:\mu_{R}-\mu_{N}\neq 0\nonumber \\ \end{equation}Under this null hypothesis, we perform a two-sided t-test using the \(\color[rgb]{0,0,1}ttest\_ind\) function from the \(\color[rgb]{0,0,1}scipy.stats\) package by providing two \(\color[rgb]{0,0,1}numpy\) arrays with the total number of rides by hour in rush hours and non-rush hours respectively. We arrive at the following conclusion.

Because the p-value is smaller than the significance level, we reject the null hypothesis and accept the alternative hypothesis that the average number of riders during the rush hours of New York City as we defined above is not equal to the average number of rides during the non-rush hours of New York City.

We randomly chose \(1\%\) from the CitiBikes data in twelve months of year 2015 as sample. From the results of our hypothesis test, we conclude that there is a notable difference between the average number of rides during the rush hours of New York City and the average number of riders during the non-rush hours of New York City.

federica B biancoover 2 years ago · Public