Data: The main resource of the data for this project was DOT specifically Vision Zero Data Feeds3. The project will include two sets of data: fatalities distributed by year and fatalities distributed by month. Both data sets consist of total number of fatalities for period from 2009 to 2016, distribution of fatalities among 3 major groups, year or month for registered fatalities, and location within 5 boroughs in NYC. To prepare the data for analysis I formatted and merged originally separated columns and created a new data frame:
Date
| Fatalities
| PedFatalit
| BikeFatali
| MVOFatalit
|
2009-01-01
| 733
| 649
| 0
| 84
|
2009-02-01
| 830
| 646
| 0
| 184
|
2009-03-01
| 253
| 226
| 6
| 21
|
2009-04-01
| 679
| 263
| 40
| 376
|
The same principle I used to format geospatial data set. Also for datasets for heatmaps I converted ‘0’ to NaN and drop NaN’s with function dropna. It was done to make sure that ‘0’ are not counted as points.
For example, dataset of Pedestrians fatalities prepared for heatmap:
PedFatalit Date geometry
1.0 2009-10-01 POINT (-73.89052628583379 40.81151194094214)
1.0 2009-10-01 POINT (-73.9002758732755 40.86378763472604)
1.0 2009-10-01 POINT (-73.96759867796442 40.58037846976183)
1.0 2009-10-01 POINT (-73.86272633841989 40.74986793616723)
1.0 2009-10-01 POINT (-73.98134068728926 40.74707866223923)