Secure Route [Xueqi Huang, clairehxq , xh895] QUESTION How we identify the most secure walking route given origin and destination information. PROPOSED APPROACH 1 Given the starting and ending points, generate three possible routes by Google API. 2 Identify turns in the three routes by which we can tell which blocks should be taken into account. Say route 1 has made 3 turns A, B, C. Then the route is consisted of four parts: starting point to A, A to B, B to C, C to ending point. Blocks involved can be correspondingly identified. 3 Based on the street score data by MIT lab, calculate a block version of street score which gives a security index of the whole block. Give a weighted security score of the three routes based on the involved blocks’ security index. The route with the highest street score is the most secure route. DATA 1 StreetScore data by MIT lab. Based on image processing methodology, the Streetscore dataset gives the geographic information and the corresponding security score. It has three columns [Longitude], [Latitude], [qscore]. 2 Google API - Directions Google Directions API gives three possible routes including the geographic information of each turns. By assigning parameters, we can get specific outputs. In this case, I set parameters mode = walk, alternatives = True as in it outputs the walking mode and it would present three different routes if possible. 3 Google API - Geocoding Google Geocoding API reverses address information and geographic information. ANALYSIS Geospatial Analysis As each street score proposed by MIT lab is randomly distributed in the city and very few data points fall exactly on the street, many of them fall on buildings of other infrastructures. The geographic data can be slightly different. Simply add up score points on the street is impracticable. We have to perform a Geospatial analysis on the street score data to get a ’block score data’ which presents the security index of the block. REFERENCES The StreetScore dataset by MIT lab. DELIVERABLE Sampling routing For a given origin and destination, purpose the secure scores for each of the three routes given by GOOGLE API and compare their differences. Iterable program A program that proposes the most secure and walkable route given user entered origin and destination.

Claire Xueqi Huang

and 2 more

ABSTRACT The objective of this analysis was to perform a data-driven analysis of CitiBike trip data in New York City using statistical testing in python. Using CitiBike data from June 2016, the relationship between rider age and trip duration was explored. Specifically, the ratio of long distance trips to all distance trips in young riders was compared to that of all riders. Younger riders typically have more energy and strength, which translates into the ability to ride farther distances compared to all riders. The result of this analysis did not result in a significant difference between young riders and all riders, and therefore the null hypothesis could not be rejected. DATA The data for this analysis was obtained from the CUSP Data Facility at New York University. The data was subset to only fields needed to calculate rider trip distance: Start Station Latitude, Start Station Longitude, End Station Latitude, End Station Longitude, and rider birth year. Next, geopy was used to calculate trip distance in miles between the stations. The data wrangling process is detailed in the linked ipython notebook. ANALYSIS Through preliminary data inspection, the team took interest in long-distance trip in CitiBike riders. The team first defined null and alternative hypotheses: Null Hypothesis Long-distance trip ratio in young bikers is less than or equal to long-distance trip ratio in all bikers. $$H_0: Ly/Ay - L/A <= 0$$ Alternative Hypothesis Long-distance trip ratio in young bikers is less than or equal to long-distance trip ratio in all bikers. $$H_a: Ly/Ay - L/A > 0$$ significance level: $$\alpha = 0.05$$ Young riders were defined as millennials born after 1980, and long distance trips referred to trips greater than three miles from start to end station. In the exploratory phase of the analysis, the team reviewed the distribution of CitiBike ridership by birth year (Figure 1). The team then looked at the data pertaining to only long trips greater than three miles in distance for both age groups (Figure 2). The ratio of long trips for all riders was also explored, as shown in Figures 3 and 4. After the exploratory phase, the team began statistical testing. A Z-test was decided upon to test the hypothesis after peer review, and all riders were defined to be the population while millennial riders were defined as the sample to be tested. The ratios of each subgroup defined in the hypothesis were calculated and then tested. RESULTS Looking back at the data, although a young riders have a larger percentage of total trips (Figure 1), younger riders have a relatively small long-distance trip ratio (Figure 2). Therefore, when the Z-test was performed, our p-value indicated that the null hypothesis could not be rejected at a 0.95 significance level. The results of this data analysis reveal that millenial CitiBike riders do not have a significantly higher ratio of long distance trips, and therefore trip distance is not dictated by rider age. Link to ipython notebook on GitHub: