This project investigated CitiBike rider and membership usage data, seeking to determine whether CitiBike sells more 24-Hour passes over the weekend or weekdays during quarter 2, April to June, of 2016. The statistical test used in this investigation was the t-test, which yielded a t-test statistic of 5.5 and p-value of 6.4 x 10-6. Accordingly, the null hypothesis was rejected and the alternative hypothesis was accepted: concluding that more 24-Hour passes were indeed purchased over the weekend, Saturday - Sunday, than during weekdays in April to June 2016.
This analysis can be seen on Github: https://github.com/jvani/PUI2016_jmv423/blob/master/HW6_jmv423/Assignment2_jmv423.ipynb
For this study, we obtained the appropriate CitiBike Rider and Membership Usage data from the CitiBike System Data portal. The data file contained information about how many membership passes were purchased on each day between April - June 2016, type of pass purchased, and the number of trips completed. For the sake of studying whether more 24-Hour Passes were purchased over the weekend or during the weekdays, the data set contained all of the pertinent information.
To make the data useful, the date format was converted to 'datetime'. After doing so, an additional column was added with the day of the week in a string format, see below:
df['Date'] = pd.to_datetime(df['Date'])
df['days'] = df[['Date']].apply(lambda x: dt.datetime.strftime(x['Date'], '%A'), axis=1)
Figure 1 illustrates the mean of the number of 24-Hour passes purchased grouped by day of the week. The dataframe was subsequently queried into weekend and weekday sub-populations.