Citi Bike Ridership Mini-Research:

Young People Are More Likely to Use Citi Bikes on Weekends

Abstract

This study aims to find out whether or not young people ride bikes on weekends more often than that of middle-aged people. The analysis performs a hypothesis test (Z-test) to compare the ratio of the number of young people using citi bikes on weekends over weekdays to that of mid-age people. The result shows that under 5% significance level, the ratio of the number of young people biking on weekends over week days(7 days) is greater than the counterpart middle-aged people.

Keywords: Citi Bike, Hypothesis Test, Z-test , Age

In a fast pace modern city like New York, citi bike has become not only one of the most popular alternatives for commuting, but also a crucial component of a city’s gradually formed network system of both transportation and social activities. As a source from which quite comprehensive data sets can be acquired, citi bike is a great subject for researchers to study citizen’s behavior through patterns in its ridership. This study aims to find out whether young people ride bikes on weekends more often than that of middle-aged people, with the assumption that the bikers’ usage of citi bikes fully reflects their personal preferences– biking only for general use rather than heavily commuting purpose.

All processed data used to perform the statistical test is from:

https://s3.amazonaws.com/tripdata which is documented on a monthly basis. The data wrangling process follows the idea of reproducibility and includes the following stages:

- 1.
Enable checking and downloading data to a pointed directory each time when searching for data of a specific month, so the existed data becomes retrievable. We choose February 2015 citi bike data for our research.

- 2.
Read the data with Pandas Dataframe; select and modify the attributes as needed(i.e. create a binary ”age group” by calculating the ages using ”birth year”). Label each row with \(18\leqslant age<40\) and \(40\leqslant age<60\) as young and middle-aged respectively.

- 3.
Plot histograms to visualize the normalized fractions of young and middle-aged bikers’ average biking trip counts as well as each individual group on each day of the week.

- 4.
Consider the errors of average daily riding counts on weekdays and weekends for both biker groups.

Since the number of population is large (greater than 30) and the standard deviation of population is known, we choose Z-test to do hypothesis test(Difference Between Z-...)(Z-test vs T-test). According to the question we focus on, we set

\begin{equation} H_{0}:\frac{\#\,of\,young\,on\,weekends}{\#\,of\,young\,on\,week\,days}\leqslant\frac{\#\,of\,middle-aged\,on\,weekends}{\#\,of\,middle-aged\,on\,week\,days}\nonumber \\ \end{equation} \begin{equation} H_{a}:\frac{\#\,of\,young\,on\,weekends}{\#\,of\,young\,on\,week\,days}\ >\frac{\#\,of\,middle-aged\,on\,weekends}{\#\,of\,middle-aged\,on\,week\,days}\nonumber \\ \end{equation}According to the formula and table(Hypothesis Test: Diff...), we derive that \(z-score=24.4665\) and the corresponding p value is lower than 0.05. Thus we reject the null hypothesis \(H_{0}\), and choose \(H_{a}\) which states that the ratio of the number of young people biking on weekends over week days (7 days) is greater than the counterpart middle-aged people.

Based on the result of Z-test, we can conclude that under the 0.05 statistical significance, young riders is much more likely to ride citibike than middle-aged people on weekends. The reasons behind the result might vary, for example, there are more social activities on weekends for young people, making them ride citi bikes more often. Further research can focus on extracting data from different months in the avoidance of seasonal influence.

Difference Between Z-test and T-test.

Z-test vs T-test.

Hypothesis Test: Difference Between Proportions.

federica B biancoover 2 years ago · Public