Citibike Extra Charge by User Types

Abstract

The aim of this analysis is to understand the business and operation
model of Citibike system comparing the two plans or models that Citibike
offer to users. The idea is that Subscribers are less likely than
Customers to pay extra time fees. Using 3-month trip data, it was
confirmed that the user type defined as Customer is more likely to pay
extra-time usage fees.

Data collection

In this analysis I used the open data from the year 2016 from the months
of February, April and June. Is important to notice that the number of
trips during summer is larger than during winter, as well the number of
Customer rides increase during warmer months. Data available here:
https://s3.amazonaws.com/tripdata/index.html

The data wrangling process is available at:

https://github.com/fernandomelchor/PUI2016_lmf445/blob/master/HW6_lmf445/HW6_2_lmf445.ipynb

The first part shows the process of getting the data and load the csv
file to pandas Data Frames, after that the columns that were not uses as
well the NaN values were dropped. Another column for the extra fees was
added in order to assess if one specific trip payed Extra Fees or not
depending on the User type. For the code and more details refer to the
link above.

Background

Cibike is bicycle share system in NYC, bike-sharing is an innovative
mode of transportation that allows users to make trips using publicly
available bikes. It consists of a fleet of specially designed, sturdy
and durable bikes that are locked into a network of docking stations
throughout the service area. The bikes can be unlocked from one station
and returned to any other station in the system, making bike share ideal
for short, one-way trips. (from Citibike
site
)

The Citibike system is a source of interesting data that can describe
the urban dynamics. The system is able to store the data available from
all the trips performed and some limited data about the users.

The Citibike is available for people in three different plans: Day-Pass
for $12, 3-Day Pass for $24 and the Annual Membership $155. The
Customers are considered the ones that use the bike in the Day-Pass or
3-Day-Pass plans. Subscribers are the users that hold an Annual
Membership.

The Subscribers receive more benefits than the Customers, this includes
unlimited 45 minute rides, after the 45 minutes an extra fee is charged
as following: $2.50 for the first additional 30 minutes, $6.50 for the
next additional 30 minutes, then $9 for each additional 30 minutes
after that.

For Customers it works as following: unlimited 30 minute rides, after
the 30 minutes an extra fee is charged as following: $4 extra per
additional 15 minutes.

The Customers are often related to tourist or people that use the bike
for different proposes rather than typical commute.

I my analysis I am stating the following IDEA:

Subscribers are less likely than Customers to pay extra time fees per
ride

Subscriber = people enrolled with a 1-year contract to use Citibike

Customer = people that buy a Day-pass or a 3-Day-pass to use Citibike,
normally associated with tourists.

NULL HYPOTHESIS:

The ratio of Subscribers rides paying extra fees over the total Subscribers rides is the same or higher than the ratio of Customers rides paying extra fees over the total Customers rides
Using a significance level α=0.05

Note: the limit usage time without extra charges is different for Subscribers and Customers
Max time for Customers = 30min = 1800 sec
Max time for Subscribers = 45min = 2700 sec

I will use a significance level $\alpha=0.05$