AbstractThe topic of my project is looking into the general status of employees' mental health condition as well as if working in technology companies or other factors will affect the mental health condition of employees. After processing data and using logistics regression and decision trees, technology companies do have a higher rate of employees having mental health problems compares to companies in general. Furthermore, employees' gender and age, size of companies and company type (tech or not) are factors that will influence whether employees will have mental health problems or not. Moreover, employees' medical history in mental health is another strong influencer. IntroductionThe first problem I want to answer is whether working in technology companies will have a higher possibility of having mental health issue compares to working in non-technology companies. Also, I want to look at different countries' state in this incident. Secondly, I want to see what are factors that can possibly account for employees occurring mental health issue in the workplace so that companies can prevent the occurrence by identifying high-risk individuals. DataThe data I used is an ongoing survey towards workers and is available on Kaggle. I've focused on survey results in 2016 but also include part of 2017's. I used data in 2016 to do the analysis on factors that could explain the occurrence of mental health issue while I use data in 2017 to see if there's an increase in this incident. The survey captures general information of each participant like their age, gender, company type, job type, and company size etc. Also, the survey records participants' companies attitude and policy towards mental health issue.  Since the dataset that  I'm using in this project is from a survey towards workers, it's highly possible that the answers are not credible enough since people tend to consciously or unconsciously give favorable answers that are socially acceptable. Furthermore, since this survey covers a lot of long questions with non-categorical answers, it's really time and energy consuming to organize and dig into each question. Therefore, I just use a small part of it and if the entire dataset can be fully exploited, a more insightful conclusion can be made. MethodologyFor each year (2016 and 2017), I calculate the total numbers of response collected in each country as well as each country's corresponding number of participants considering themselves having mental health problems. I've also done the same calculation for responses from participants working in technology companies. I've chosen countries that have at least 10 responses and plot the result in a bar plot for each year (Figure1 and Figure2).
Screen shot 2018 11 06 at 9 03 15 pm
AbstractThe idea for this Citi Bike mini project is to test if customers are less likely to ride Citi Bike comparing to subscribers during weekdays in March 2015. The null hypothesis I proposed is that the portion of customers riding Citi Bike on weekdays is the same or higher than the portion of subscribers riding Citi Bike on weekdays in March 2015. The significance level that I use for this mini project is 0.05. I've adopted z-test to test my null hypothesis and get an extremely small p-value so the result is that I reject the null hypothesis and state that customers are less likely to ride Citi Bike than subscribers during weekdays in March 2015.IntroductionCiti Bike is a public bike sharing system operated by Motivate and named after its lead sponsor Citigroup. There are two types of user type - customers and subscribers. Subscribers are those who have bought an annual membership and can ride unlimited 45-minutes rides throughout the year. Customers are those who pay every time they ride. Since there have different kinds of user type for Citi Bike riders, whether or not there a significant difference in the number of rides for different user type during weekdays and weekends would be an interesting point to look at. DataThe dataset that I've used for the statistical test is from https://s3.amazonaws.com/tripdata. More specifically, I look into the data in March 2015. I've grouped the number of rides by user types and days of a week to get the rides proportion of customers and subscribers during weekdays and weekends. I've also calculated the errors for the counts. In order to visualize the data better, I create a bar plot to see the normalized distribution of bikers as in Fig. 1 and indicates the fraction of bikers for each user type as in Fig. 2.