From the Chi square test contingency table above, we can get the Chi square statistics as 2440.53 which is way more than the 3.84. It means the P <<0.05, so we can reject the NULL hypothesis that 'the ratio of riders who are over 30 years old biking on weekends over the weekdays is the same or higher than the ratio of riders who are under 30 years old biking on weekends over biking on weekdays'. It also means we can say the riders under or equal to 30 years old are more prone to CitiBike on weekends than riders over 30 years old.
Some interpretation:
In the dataset, generally, riders with birth year information are subscribers and it has a great chance that they are city residents. Older(>30) people might have family and stable jobs, during weekends, they probably spend more time at home with family or choose to go outside by driving together, and normally places for family activities are too far for biking. Younger people in the meantime, might do more social activity in town at some places close by, or use bikes to commute in college campus.
The weakness and potential further studies of this project are:
- Data limitation. Only use one month data might not enough to demonstrate a trend. We can improve the experiment by using more data, maybe a month from winter since this is a summer one.
- New York is such an international city that many young riders on weekend can also be visitors from other cities or even countries. Further study can look into how many of them are subscribers.
- Further study of these two age group ridership by areas, this can be done by using the location information to group them into different boroughs or even zip code areas.