Data to Collect!
Next task was to collect the fare data for all these OD(Origin Destination) Pairs. We use the UBER API Price “GET /v1/estimates/price” to collect the fare data for all these locations. What we found out in the Uber Api that it does take the timezone as parameter , it just takes four parameters

  1. start_latitude

  2. start_longtude

  3. end_latitude

  4. end_longitude

Uber API gives the result according to the current time or time at which query or request was made. On making the request , a json response we get containing the details for all the different UBER services namely uberPOOL, uberX, uberXL, uberFAMILY,UberBLACK, UberSUV.For handling the huge dataset and to run the api request at the specified time the dataset was divided into smaller parts and all code run in parallel.Large number of UBER API keys around 100 keys were used and data was collected.Approximate Time for a given timezone data was desired to be less than or equal to one hour.
For initial comparison we will collect data for four timezones 6 , 10 , 16 , 20, As we are 9 hrs 30 minutes ahead and also we collect the list of locations for different timezones using the Query: Only selecting locations
select location from (select location, count(location) as cnt, avg(tripdist) avgdist, avg(duration) avgtime, avg(totamt) avgfare from nyctaxi where pickup!=dropoff and duration >0 and tripdist > 0 and pathdistkey >0 and timezone=x group by location having cnt>=5 order by cnt desc);
(x = 6,10,16,20) Here 5 is the threshold we have taken which considers locations which have a minimum of Frequency 5 or atleast 5 times that trip is there in the dataset. Some of the statistics are

  1. Timezone 6 63409 locations (Time aprox. 13 minutes) (6 am in NYC = 3:30 PM IST) (Start Code at 3 PM IST)

  2. 10 210065 locations (Time approx 42 minutes) (10 am in NYC = 7:30 PM IST)(Start Code at 7 PM IST)

  3. 16 169463 locations (Time approx. 34 minutes) (4 pm in NYC = 1:30 AM IST )(Start Code at 1 AM IST)

  4. 20 251318 locations (Time approx 51 minutes.)(8 pm in NYC = 5:30 am ist)(Start Code at 5 AM IST)

Python scheduler was used to schedule the code to run at the specified time. The result is dumped into a json in the format like the key is location(that is OD pair) and value is the json response we got.