We asked the question, "is there a relationship between the age of Citi Bike riders and the duration of their trips?" Our alternative hypothesis is that these would be negatively correlated; in other words, as age goes up, trip duration goes down. The null hypothesis is that age has no correlation with or a positive correlation with trip duration. We used a 0.05 confidence interval.
H0 = as age increases, trip duration does not change or increases
We pulled Citi Bike data from two months: January 2016 and July 2015. We removed all columns except for date of birth and trip duration. We created a variable for age by subtracting the current year (2016 and 2015, respectively) from the year of birth. At the suggestion of a reviewer (and in order to run the Pearson test), we dropped NaN data from both variables and reduced the larger one to match the size (by random elimination) of the smaller. The same reviewer suggested we remove outliers, which we didn't do, but this would be a good idea, particularly for the age variable (potentially remove values above 80 or 100).
First we created a scatter plot of the data. For correlation analysis, we chose a Pearson test, since we have just two variables. One of our reviewers suggested we should do a multiple regression test with other variables, but we chose to stick with just the question of age and trip duration. Another reviewer suggested we use an OLS test, but we felt correlation was appropriate for two variables. OLS might have been good if we had additional variables.
The January 2016 data showed a correlation coefficient of .20 and a p-value of 0.0. The tells us that age and trip duration are slightly positively correlated, and a p-value below 0.05 lets us accept this conclusion. So far the null hypothesis stands.
We ran the test again on the second data set - July 2015 - and found similar results. A correlation coefficient of .144 and a p-value of 0.0. In this sample the relationship between the two variables is slightly weaker, but the conclusion is the same: age and trip duration are positively correlated. The null hypothesis cannot be rejected.