Second is to choose a test to test the hypotheses. Through peer-review (thanks peer cd2682 and ab8131), it is suggested that z-test should be used. However, giving the hypotheses and dataset, the t-test may be more suitable. T-test is always used for testing the null hypothesis that 2 independent samples have identical average (expected) values. The 2 samples here are two types of group draw from total population, with unknown population mean and standard deviation. Therefore, the objective of this test is not testing the sample drawn belongs to the same population but comparing the mean of two given samples. In addition, the sample size just around 30 so it’s not a large sample size. Therefore, the t-test should be applied.
Conclusions
The ttest_ind method imported from scipy.stats is used for calculating the t-score and p-value. Results are shown in Fig.5, since the p-value 0.58 is far larger than the significant level 0.05, we cannot reject the null hypothesis that the number of rider who born after 1980 (including 1980) is same or lower than who born before 1980. The idea that young people tend to ride more than other age groups cannot be proved.