1

Jingtian Zhou

and 1 more

AbstractThe study aims to find out the possible correlation between the average CitiBike trip durations of different genders in February 2015. The null hypothesis was set that the trip duration of men biking on average is the same or significantly larger than the trip duration of women biking on average under significance level 0.05. This article describes the process of analyzing the sample data and conducting the Z statistical test. The result shows that we fail to reject the null hypothesis and the trip duration of men biking on average is indeed the same or significantly larger than the trip duration of women biking on average under significance level 0.05.IntroductionThe development of the bike-sharing system has not only brought millions of citizens great convenience but also significantly improved our awareness of living a sustainable lifestyle. In May 2013, New York City launched the largest bike- sharing system in North America, which currently equipped with more than 750 stations and 12000 bikes. With this massive amount of bikes, a massive amount of data are produced every single day, and by analyzing those data, we could acquire many interesting facts. For example, although men have physical advantages comparing with women, does that necessarily mean that men tend to have more trip durations than women? The analysis below will help find out the answer to that question.DataI conducted the research by first curling February 2015 data from CitiBike website unzipped the file to my ADRF.  After that, I preprocessed the data by dropping all the columns that are not related to the null hypothesis. According to the describe function after groupbying the data by gender, I found there are a third unclear gender and many large outliers. To better analyze the data and test the null hypothesis, I dropped all the data of the third gender and large outliers where the trip duration is larger than 2500. Besides, I found the number of trips by women is five times that of men, so I reduced the sample size of women riders by selecting only one trip out of every five trips by women. The histogram  and boxplot of the distribution of trip durations of riders of each gender is in below.