I tested the null hypothesis that the proportion of riders 35+ to total riders for trips starting at midnight-5 am is higher or equal
to the proportion of riders 35+ to total riders for trips starting at 5 am - midnight. This null hypothesis and methodology received assistance from the Github review of HW3 by Hao Xi, who caught a typo in my original hypothesis formulation and added some additional clarity and rigor to my methodology \cite{hw3}.
I used Federica Bianco's function to determine the chi-squared test statistic for the above table, which is 1471.503. This statistic is evaluated against the test statistic of 3.84 for the 95% confidence level, and accordingly, the null hypothesis is rejected, suggesting that younger riders are are a greater proportion of late-night riders.
To validate these results against another month and season, I performed the same test on the February 2016 Citibike data, which yielded a chi-squared statistic of 502.98. The null hypothesis is again rejected at the 95% confidence level.
Please see Jupyter Notebook at end of this paper for the code used, additional figures, and the derivation of the test statistic.
Conclusion
The chi-squared test of proportions on Citibike data for July 2017 and February 2016 produced rejections of the null hypothesis at the 95% confidence level. This suggests that younger riders make a up a statistically significantly greater proportion of riders in late-night hours than older riders. Some limitations of the approach include the arbitrary division of "older" vs. "younger" riders based on age 35 and of daytime vs. nighttime based on midnight. A fuller analysis may use a Python package like Astral to use sunrise and sunset times to divide day and night more precisely. Additionally, as seen in Figure 3, the most interesting pattern in the data is that older riders emerge in strength in the early-morning commute hours between 5 A.M. and 8 A.M. I would like to explore that data further and to see whether the pattern holds as you divide the age brackets into decade bins rather than binary groups.
Jupyter Notebook