Vision Zero Crash Data Analysis 

Abstract: The project is focused on exploring fatalities occurred in New York City (NYC) among 3 major groups involved in traffic accidents between 2009 and 2016: pedestrians, bicyclists, and motor vehicle occupants (MVO). The project’s results show that overall trend in fatalities is declining, while trend analysis for each group shows that fatalities among bicyclists is increasing. Also, further analysis revealed that the highest number of fatalities that occurred: for pedestrians in lower east side downtown Manhattan, zip code 10002; for bicyclists in east Harlem uptown Manhattan zip code 10029; for MVO in East Flatbush, Brooklyn, zip code 11203. Original GitHub link for code:
Introduction: In New York City (NYC), nearly 4,000 residents are injured and more than 250 residents are killed due to traffic collisions each year1.  According to NYC government, collisions injure or kill some City residents every two hours2.  In response to the high injury rate, the City has taken a number of initiatives to mitigate the volume of injuries, including, expanding enforcement against speeding and yielding to pedestrians, harsher penalties for dangerous drivers, and new street designs that improve safety. The questions for this project are to find out tendency of fatalities due to road accidents in NYC for period from 2009 to 2016, localize some high-risk locations for 5 boroughs for 3 major groups – pedestrians, bikers, and motor vehicle occupants, and explore what are the changes for these locations. To answer the stated questions, I’m going to analyze a data provided by Department of Transportation (DOT) of NYC.   
Data: The main resource of the data for this project was DOT specifically Vision Zero Data Feeds3. The project will include two sets of data: fatalities distributed by year and fatalities distributed by month. Both data sets consist of total number of fatalities for period from 2009 to 2016, distribution of fatalities among 3 major groups, year or month for registered fatalities, and location within 5 boroughs in NYC. To prepare the data for analysis I formatted and merged originally separated columns and created a new data frame: 
Date
Fatalities
PedFatalit
BikeFatali
MVOFatalit
2009-01-01
733
649
0
84
2009-02-01
830
646
0
184
2009-03-01
253
226
6
21
2009-04-01
679
263
40
376

The same principle I used to format geospatial data set. Also for  datasets for heatmaps I converted ‘0’ to NaN and drop NaN’s with function dropna. It was done to make sure that ‘0’ are not counted as points.


For example, dataset of Pedestrians fatalities prepared for heatmap:
PedFatalit          Date                   geometry
1.0                     2009-10-01        POINT (-73.89052628583379 40.81151194094214)
1.0                     2009-10-01        POINT (-73.9002758732755 40.86378763472604)
1.0                     2009-10-01        POINT (-73.96759867796442 40.58037846976183)
1.0                     2009-10-01        POINT (-73.86272633841989 40.74986793616723)
1.0                     2009-10-01        POINT (-73.98134068728926 40.74707866223923)