\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\begin{document}
\title{HW8 Assignment 2}
\author[1]{QY}%
\author[2]{yg833}%
\author[2]{sjf374}%
\affil[1]{NYU Center for Urban Science \& Progress}%
\affil[2]{Affiliation not available}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\sloppy
Submitted by: Yanmei Guan @yg833, Samantha Jeanne Falk @sjf374, Qinyu
Goh @qg412
\par\null
\textbf{ABSTRACT:~}
For this Citibike mini project, our team wanted to test if riders were
more keen on riding Citibike on Saturdays than Sundays. The idea was
based on the rationale that there are more places of interests closed on
Sundays than Saturdays. To test our idea, we looked at Citibike data
from 2016 and selected 1 month of data from each season -February for
Winter, May for Spring,~ August for Summer, and November for Fall.
Seasonality is important considering that bike riding is outdoors;
cooler temperatures during some seasons will affect ridership. Ergo by
sampling 4 months across the year, we are hoping to see a more fuller
picture.~
\par\null
We initially visualized the counts of rides by weekday using scatter
plot, mean with error bars, and box plot with median of riders,~and it
looked like there maybe some differences. Especially considering that
the mean number of rides for Saturday was 32884.13 with a standard
deviation 12260.51 and the mean number of rides for Sunday was 28834.11
with a standard deviation of 11372.29. Then, we ran a~two sample t-test
on the counts from Saturdays and Sundays across the 2016 year, and it
returned a t-statistic of 0.985 and a p-value of 0.332. As the p-value
is greater than 0.05, we fail to reject the null hypothesis and
therefore conclude that there the mean bike trips on Saturdays are the
same or less than the mean of bike trips on Sundays in the 4 months of
2016, at a significance level of 0.05.~
\par\null
\textbf{INTRODUCTION:\emph{~}}
\par\null
Citibike is New York City's (NYC) very own bicycle sharing program.
Functioning as a docked bicycle sharing system, users either purchase
day passes or annual membership in order to unlock a bicycle at a
specific station and ride it to another station to return the bicycle.~
Since its inception in~2013, Citibike has quickly grown to become a
staple mode of transportation in NYC, even beating taxi in travelling
time in certain instances~\citep{bliss2017} .~~
\par\null
Given the popularity of Citibike in NYC,~ the team set out the explore
Citibike's readily available public trip data to see if interesting
trends of usage, as well as the behavior of users, can be distilled.~
\subsubsection*{Idea/Question}\label{ideaquestion}
The main idea that the group decided to explore was whether there is a
difference in Citibike~usage on weekends. Prior to this, each member of
the team had slight differences in the specific questions and hypothesis
formulated.~ We made sure to merge the thoughts and ideas into one
consistent question and hypothesis, after factoring into considerations
the feedbacks received from the 3 reviewers (Dr Bianco, Amber and
Mark).~
\par\null
As a result, the finalized question is on whether Citibike users are
more likely to ride on Saturdays than Sundays. Some underlying potential
rationale is that there are more places of interests closed on Sundays
than Saturdays. Sundays are typically worship days where people go to
the church too. Hence, it would be interesting to see if that might
correlate in any way with the~ridership of Citibike.~
\par\null
\subsubsection*{Hypothesis}\label{hypothesis}
The hypothesis was modified following feedbacks from the reviewers.~The
team agrees with Dr Bianco's point that it is important to choose a time
period that is long enough so that the data is not skewed by one-off
events.~ The time period for analysis was thus expanded from a single
month in June 2016 to include instead February, May, August and November
2016. Each month represents a season of the year.~
\par\null
While there was a suggestion by Dr. Bianco that the median might be a
better metric than the mean because it is more robust to outliers, the
team decided the keep the hypothesis centred~on mean because most of the
statistical tests we know thus far are more suited to compare means.~
\par\null
\emph{Null hypothesis}: The mean bike trips on Saturdays are the same or
less than the mean of bike trips on Sundays in 4 selected months
representing each season of the year.
\par\null
\emph{Alternative hypothesis}:~ The mean bike trips on Saturdays are
more than the mean bike trips on Sundays in 4 selected months
representing each season of the year.
\par\null
\(H_0:\ Sun_{avg}\ -\ Sat_{avg}\ \ge0\)
\(H_1:\ Sun_{avg}\ -\ Sat_{avg}\ <0\)
at a significance level of~\(\alpha=0.05\)
\par\null
\textbf{DATA}
\par\null
Data for each of the selected months in 2016 was downloaded from the
Amazon S3 cloud server. It was further processed to help understand and
visualize the data.~~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Screen-Shot-2018-11-06-at-8-12-41-PM/download-1}
\caption{{Scatterplot showing each day's total trip count for all 4 months, color
coded according to which day of the week it is (0 is Monday and 6 is
Sunday). It can be observed that generally, Saturdays (orange) and
Sundays (red) tend to have lower total trip counts than the rest of the
weekday.
{\label{778676}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/download/download}
\caption{{Error bar graph showing the overall means and standard deviation from
the means for all~mondays to sundays~throughout the 4 selected months (0
is Monday and 6 is Sunday).~ It can be observed that overall Sunday has
the lowest mean and the smallest standard deviation. Saturday is the
next lowest. Wednesday is a peak ridership day.~
{\label{122118}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/median-total-bikes/median-total-bikes}
\caption{{Box plot of bike rides per weekday (0 is Monday and 6 is Sunday). The~
box plot shows the the median of each weekday, and see that~ Saturday
has a higher median than Sunday. The max for Saturday is also higher
than the max for Sunday.~
{\label{100843}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Screen-Shot-2018-11-06-at-8-17-58-PM/Screen-Shot-2018-11-06-at-8-17-58-PM}
\caption{{Bar graph showing the overall average ridership for Saturday (5) and
Sunday (6).~ Saturdays have a higher mean than Sundays.~~
{\label{877174}}%
}}
\end{center}
\end{figure}
\par\null
\textbf{METHODOLOGY}
\par\null
Since we are comparing two medians that represent samples of the
population, we decided to use the T test. There were many reasons why we
chose the T test, one of which is that all of our reviewers (Dr. Bianco,
Amber and Mark) suggested we use a T test. The purpose of~T test is to
compare two numbers and the standard deviations of those distributions.
Though Dr. Bianco has suggested we do a 1 sample to compare Saturday to
the mean of Sunday, we additionally did a two sample t-test to compare
Saturday overall to Sunday since they are both samples and neither
represent the entire population.~
\par\null
Other options we discussed were chi square and z test, these were not
appropriate because we are not measuring proportions and we don't the
computing power to pull in the entire population.~
\par\null\par\null
\textbf{CONCLUSIONS}
\par\null
Despite the differences between the two means complying with our
alternative hypothesis that Saturday's mean was larger than Sunday's, we
fail to reject the null hypothesis as the returned p-value of 0.332 is
greater than 0.05. Therefore, we conclude that there the mean bike trips
on Saturdays are the same or less than the mean of bike trips on Sundays
in the 4 months of 2016, at a significance level of 0.05.~
\par\null
Running the data on multiple years would supply more data to run the
test on and in turn could yield a different result. Many times this can
be explained by the central limit therum.~
\selectlanguage{english}
\FloatBarrier
\bibliographystyle{plainnat}
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}