\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\begin{document}
\title{The Impact of Gender on Bike Trip Duration}
\author[1]{Muci Yu}%
\affil[1]{New York University CUSP}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\selectlanguage{english}
\begin{abstract}
Bike-sharing systems are becoming increasingly prevalent in urban
environments. They provide a low-cost, environmentally-friendly
transportation alternative for cities. The customer behavior of the
bike-sharing system is therefore an important area of study. In this
study, I compare the trip durations by male and female customers.%
\end{abstract}%
\sloppy
\section*{Introduction}
{\label{874460}}
As the largest bike-sharing program nationwide, Citibike has been an
increasingly popular mode of transportation for New York City residents.
Currently, there are more than 40,000 trips per day. Therefore, it is
crucial to study the customer behaviors so that bike operators can
better manage the allocation of bike facilities. Using Citibike data
from January 2015, this study compares the trip durations of male and
female customers.~~
\par\null
\section*{Data}
{\label{460221}}
The dataset used in this study contains more than 285,000 bike trip in
January 2015. The dataset can be obtained from the Citibike official
website. After obtaining the data, I reduced the dataset to only
variables of interests,~\emph{tripduration} and~\emph{gender.~}Trips
that are more than three standard deviations away from the mean value
are considered as outliers and are dropped from the dataset. Figure 1
shows the distribution of trip durations.
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/citibike-fg1/citibike-fg1}
\caption{{This graph shows the distribution of bike trip durations by male and
female users.
{\label{583676}}%
}}
\end{center}
\end{figure}
\section*{Methodology}
{\label{230815}}
Our null hypothesis is that the average duration of bike trips taken by
women is not significantly different from that by men, while the
alternative hypothesis is the average duration of bike trips taken by
men is longer than that by women.
\par\null
To test the hypotheses, I use two sample t-test, because we want o
compare whether the average difference between~\textbf{two}~groups is
really significant or if it is due instead to random chance.
\par\null
I use the t-test function from scipy.stats to perform the two sample t
test.~
\begin{verbatim}
from scipy.stats import ttest_ind
t, p = ttest_ind(df_m['tripduration'], df_f['tripduration'], equal_var=False)
print("ttest_ind: t = %g p = %g" % (t, p))
#Output:
ttest_ind: t = -32.6155 p = 1.07254e-231
\end{verbatim}
Since the p-value is 0, I can reject the null hypothesis at 0.01, 0.05
and 0.1 significance level.~
\par\null\par\null
\section*{Conclusion}
{\label{142350}}
From the t-test results, we can conclude that the trip duration of male
customers is indeed greater than that of female customers. This results
has some real-world implications. For instance, bike operators can
consider place bike stations in higher density in areas that have higher
percentage of female customers as they tend to take short trips.~
\section*{Acknowledgements}
{\label{687807}}
Thanks Yavuz Sunor for reviewing my proposal.
\selectlanguage{english}
\FloatBarrier
\end{document}