\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[greek,english]{babel}
\begin{document}
\title{HW7\_Assignment1\_citibike\_mini\_project}
\author[1]{Yixuan Tang}%
\author[2]{Cheng Ma}%
\affil[1]{New York University (NYU)}%
\affil[2]{Affiliation not available}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\sloppy
\textbf{Abstract}
The idea of this project comes from the curiosity of the generation
distribution of Citi-bike user, we divide the data into 2 categories,
pre-90s and post-90s to see whether younger generation less likely to
use Citi-bike to commute, and what is their characters in trip duration,
trip time and route pattern. Through the analysis, we reject the null
hypothesis that the proportion of pre-90s biking on weekends is the same
or higher than the proportion of post-90s biking on
weekends(alpha=0.05), hence, we are reasonable to conclude that post-90s
are less likely to use Citi-bike for commuting than pre-90s.
\textbf{Introduction}~
Citi-bike is the largest bike share program in the US, it gives citizens
a healthy, interesting and affordable way to get around town. In this
project, we look into the generation because we are curious about
whether younger people are less likely to use bike for commuting,
generally, public transportation and bicycle are cheap transit methods,
either workers or students would love to use, however, since students
may have another choice: school bus, it may lead to students(post-90s)
less likely to use Citi-bike for commuting. In the end, If we can learn
Citi-bike user pattern differences between generations, this project can
gives suggestions for location of Citi-bike station, for instance,
~establish more Citi-bike stations near office building or school to
fulfill the larger demand of specific people.
\textbf{Data}~
We used the data from citi-bike and we selected datasets from July and
December 2016 since we assume that~the biking pattern would be slight
different from summer to winter and looking into 2 seasons would lead us
to the reliable conclusion. As the figures show, the counts of July is
greatly higher than December. In the data processing, we convert the
``starttime'' column which is in string format into ``date'' using the
function ``pd.to\_datetime'', thus, we can learn users counts in
specific weekdays.
According to my peer Heci's suggestion, he thinks it's better to dig
more into weekdays' rush hours, after careful consideration, I didn't
select peak hour to analyze this problem, the reason is the working hour
is basically same for both workers(more pre-90s) and students(more
post-90s). In future projects, if it is necessary, I would definitely
take rush hours into consideration.~
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Figure1/Figure1}
\caption{{\textbf{Distribution of Citi-bike bikers(pre-90s and post-90s)in July
2016, absolute counts, with statistical errors}
{\label{241331}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Figure2/Figure2}
\caption{{\textbf{Distribution of Citi-bike bikers(pre-90s and post-90s)in
December 2016, absolute counts, with statistical errors}
{\label{153833}}%
}}
\end{center}
\end{figure}
\textbf{Methodology}~
Along with the suggestion from my peer Heci, we used z-test. Z test is
useful in this project to test H0, because the data for pre-90s and
post-90s samples both from the same population, and it has one variable
as usage quantity of the bike, and two categories(pre and post-90s)).
Also, we can easily tell the sample is way over 30.
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/fig3/fig3}
\caption{{\textbf{Z statistic for July 2016}
{\label{337487}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/fig4/fig4}
\caption{{\textbf{Z statistic for December 2016, z statistic is larger in the
summer}
{\label{183737}}%
}}
\end{center}
\end{figure}
\subsection*{}\label{section}
\subsection*{}\label{section-1}
\subsection*{Conclusions~}\label{conclusions}
The results of analysis match the idea that the post-90s are less likely
than pre-90s to choose biking~for commuting. And the conclusions of July
and December are the same so that our work is robust to sensonality.
For testing its significance, we calculate the Z-statistics as 30.98 in
July and 28.24 in December, we got the corresponding p-value \textless{}
0.0002,~which is smaller than my chosen~\selectlanguage{greek}α\selectlanguage{english}=0.05. So we can reject the
Null Hypothesis~and my conclusion is statistically significant (by a
lot!)
Strength: We got a more reliable conclusion by analyzing two
significantly different months in one year.
Weakness: We did not pick gender as a critical character given that
different genders make different choices ( proved in FBB's instruction).
If the majority of post-90s users are females, our work is a simple
duplication of FBB's and lose its sense.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/1510111198197/1510111198197}
\caption{{\textbf{Distribution of Citi-bike bikers(pre-90s and post-90s)in July
2016, normalized, with statistical errors}
{\label{661754}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/1510111529934/1510111529934}
\caption{{\textbf{Distribution of Citi-bike bikers(pre-90s and post-90s)in July
2016, normalized, with statistical errors}
{\label{552600}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/1510111552558/1510111552558}
\caption{{\textbf{Distribution of Citi-bike bikers(pre-90s and post-90s)in
December 2016, normalized, with statistical errors}
{\label{688278}}%
}}
\end{center}
\end{figure}
\selectlanguage{english}
\FloatBarrier
\end{document}