Studying The Relationship Between Biking Trip Duration And Biker Age: A Case Study Using NYC Citi-Bike Data

Abstract

This mini-project looked at the relationship between biking duration and biker age. A question of whether the biker age will have an impact on biking trip duration has been raised. To analyze this problem quantitatively with an inferential statistics approach, this project conducted a two-tail hypothesis testing using Welch’s T-test for means of two independent samples with the NYC’s Citi-Bike data-set. From the testing result, it is found that there is a significant difference between older and younger bikers’ average biking trip durations.

Data

This study used the NYC’s open data for Citi-Bikes [1]. Specifically, trip history data for the whole year of 2015 (12 months) was used to minimize the impact of seasonality towards the analysis. The original data sets only contains bikers’ birth year. It is assumed in that the biker age will simply be the data set year (2015) subtracted by his or her birth year. The trip duration data was converted from seconds to minutes format. The data is then divided into two parts: one with age larger than or equal to 45 years old, and the other less than 45 years old.

Analysis

The analysis has two parts: exploratory and statistical testing. The exploratory analysis is fairly straightforward. The age distribution of bikers was visualized using histogram plot, as shown in Figure 1. The trip duration distribution can be further normalized to answer the problem qualitatively. The normalized trip duration distribution can be found in Figure 2. The means of two age groups have also been visualized with the assumption of poisson distribution for errors (Figure 3).

Figure 1: Trip Duration Distribution with Citi Bikers Age

Figure 2: Normalized Trip Duration Distribution with Citi Biker Age