By using these variables, it will allow for a better production outcome. The correlation between the Collision data reflected in the heatmap above shows each square indicates the correlation between variables on each axis. Correlation ranges from -1 to +1. Values closer to zero means there is no linear trend between the two variables; such as the number of pedestrians killed, and the number of cyclists killed. Also, it shows a close relationship between the person killed and the pedestrian killed. However, the heatmap reflects a negative relationship between the zip count and the number of injured. As noted earlier, the number that is close to 1 indicates a more positively correlated correlation, which indicates a strong relationship. The numbers that are closer to -1 has more of a diverse correlation, but instead of both increasing, one variable will decrease as the other increases. The larger the number and darker the color, the higher the correlation between the two variables.
The variables were used for both spatial analysis, and predictions models were the followings:
- Vehicles type code 1
- Vehicles type code 2
- Borough
- Year
- Zip Code
- Zip Code Counts
- Number of persons killed
- Number of persons injured
- Number of Pedestrians killed
- Number of Pedestrian injured
- Number of cyclists killed
- Number of cyclists injured
- Latitude, Longitude
- contributing_factor_vehicle_1
An overview of the collisions per zip code was carried out as part of the research, as reflected in the map below the zip code within Brooklyn: East New York, Flatbush, and East Williamsburg. For Queens are South Ozone Park and South Jamaica (near John F. Kennedy International Airport), and for Manhattan, the areas are East Village, West Village, and Flatiron District. These are the largest number of crashes, noted that they are highly dense areas with a wide range of income, with a variety of land use that will be discussed later in the analysis.