Shaiwal Sachdev added textit_textbf_Data_to_Collect__.tex  almost 8 years ago

Commit id: e7901fee0e38de6a6bed73c1ef57990030721837

deletions | additions      

         

\textit\textbf{Data to Collect!}  Next task was to collect the fare data for all these OD(Origin Destination) Pairs.  We use the UBER API Price \textbf {"GET /v1/estimates/price"} to collect the fare data for all these locations.  What we found out in the Uber Api that it does take the timezone as parameter , it just takes four parameters  \begin{enumerate}  \item start_latitude  \item start_longtude  \item end_latitude  \item end_longitude  \end{enumerate}  Uber API gives the result according to the current time or time at which query or request was made.  On making the request , a json response we get containing the details for all the different UBER services  namely \textbf{uberPOOL, uberX, uberXL, uberFAMILY,UberBLACK, UberSUV}.For handling the huge dataset and to run the api request at the specified time the dataset was divided into smaller parts and all code run in parallel.Large number of UBER API keys around 100 keys were used and data was collected.Approximate Time for a given timezone data was desired to be less than or equal to one hour.   For initial comparison we will collect data for four timezones 6 , 10 , 16 , 20,   As we are 9 hrs 30 minutes ahead and also we collect the list of locations for different timezones  using the   \textbf{Query: Only selecting locations  \\select location from (select location, count(location) as cnt, avg(tripdist) avgdist, avg(duration) avgtime, avg(totamt) avgfare from nyctaxi where pickup!=dropoff and duration >0 and tripdist > 0 and pathdistkey >0 and timezone=x group by location having cnt>=5 order by cnt desc);}(x = 6,10,16,20)  Here 5 is the threshold we have taken which considers locations which have a minimum of Frequency 5 or atleast 5 times that trip is there in the dataset.   Some of the statistics are   \begin{enumerate}  \item Timezone 6  63409 locations (Time aprox. 13 minutes) (6 am in NYC = 3:30 PM IST) (Start Code at 3 PM IST)  \item 10  210065 locations (Time approx 42 minutes) (10 am in NYC = 7:30 PM IST)(Start Code at 7 PM IST)  \item 16  169463 locations (Time approx. 34 minutes) (4 pm in NYC = 1:30 AM IST )(Start Code at 1 AM IST)  \item 20  251318 locations (Time approx 51 minutes.)(8 pm in NYC = 5:30 am ist)(Start Code at 5 AM IST)  \end{enumerate}  Python scheduler was used to schedule the code to run at the specified time.  The result is dumped into a json in the format like the key is location(that is OD pair) and value is the json response we got.