Authorea

Shaiwal Sachdev edited textit_textbf_Data_to_Collect__.tex almost 8 years ago

Commit id: b4905555575bef87640ea80659acc227b7f93b85

deletions | additions

We use the UBER API Price \textbf {"GET /v1/estimates/price"} to collect the fare data for all these locations. What we found out in the Uber Api that it does take the timezone as parameter , it just takes four parameters \begin{enumerate} \item start_latitude start\_latitude \item start_longtude start\_longtude \item end_latitude end\_latitude \item end_longitude end\_longitude \end{enumerate} Uber API gives the result according to the current time or time at which query or request was made. On making the request , a json response we get containing the details for all the different UBER services namely \textbf{uberPOOL, uberX, uberXL, uberFAMILY,UberBLACK, UberSUV}.For handling the huge dataset and to run the api request at the specified time the dataset was divided into smaller parts and all code run in parallel.Large number of UBER API keys around 100 keys were used and data was collected.Approximate Time for a given timezone data was desired to be less than or equal to one hour. \\For initial comparison we will collect data for four timezones 6 , 10 , 16 , 20, As we are 9 hrs 30 minutes ahead and also we collect the list of locations for different timezones using the \textbf{Query: Only selecting locations \\select location from (select location, count(location) as cnt, avg(tripdist) avgdist, avg(duration) avgtime, avg(totamt) avgfare from nyctaxi where pickup!=dropoff and duration >0 and tripdist > 0 and pathdistkey >0 and timezone=x group by location having cnt>=5 order by cnt desc);}(x = 6,10,16,20) Here 5 is the threshold we have taken which considers locations which have a minimum of Frequency 5 or atleast 5 times that trip is there in the dataset. Some of the statistics are \begin{enumerate} \item Timezone 6 63409 locations (Time aprox. 13 minutes) (6 am in NYC = 3:30 PM IST) (Start Code at 3 PM IST)

\item 20 251318 locations (Time approx 51 minutes.)(8 pm in NYC = 5:30 am ist)(Start Code at 5 AM IST) \end{enumerate} Python scheduler was used to schedule the code to run at the specified time. The result is dumped into a json in the format like the key is location(that is OD pair) and value is the json response we got.