Each session has to be included, each has to be written in plain English, full sentences. Each session should be between 50 and 100 words.

 

Introduction

Citibike, introduced to New York City in 2013, is a privately owned bike-share system expanding throughout the boroughs of the City. Since its inception, use of the bike share system has grown significantly, 
The marketing rationale worth exploring is the gender differences that may impact customer base. Citi Bike as a service wants to reconcile the reasons for restrictions in user uptake. If some users are not willing to use the service, there is a market expansion potential. For example if user disparities are based on perception of rider safety, proximal access, or income based preferences, these present potential market expansion opportunities for Citi Bike.

Data - This should include figures (with captions!!) that help visualize and understand the data.

We considered one month of data, July 2017, dropping extraneous observed variables so that the only two that remained were relevant to our research: birth year and gender. We subtracted birth year values from 2017 in order to determine the age of each rider. The data contained a few unrealistic observed ages that were likely incorrect or misreported (one rider was listed as being 140 years old), so we removed all far outlying data points with an observed age greater than 79.

Methodology - Either this session of the next one should contain figures as well to show the results.

In order to test for significance, we used the two-tailed independent t-Test. Our independent variable (age) is categorical, and our dependent variable (age) is continuous. Based on this information, we determined that we could use either the t-Test or an ANOVA. Because the ANOVA is more complex, we decided that the simpler t-Test would be more appropriate. 
Alternatively, we were encouraged to use the z-Test and the Chi-Squared test. We cannot use Chi-Squared because age is not categorical and we are not testing a proportion, and we cannot use the z-Test because we do not have the population parameters. Thus, the independent t-Test was our best fit. At first we were considering a one-tailed test to determine if female riders had a significantly lower average age than males, but eventually determined that we did not have a strong enough reason to pre-determine the direction of our significance, thus we used the two-tailed. 

Conclusion

Based on the outcome of the t-Test, we were able to reject the null hypothesis and assert that there is a significant difference between the average age of female and male users. To strengthen the analysis, we could increase the number of months that we consider in our dataset. To add additional value to this analysis, we could try to identify whether there is a significant difference between the ages of users using Citibike recreationally versus those using Citibike for commuting to work, based on gender. This could help to better inform the user base and where future stations should be built.