Traffic crash prediction and accuracy modeling are essential and useful tools for planning, engineering, decision-making, and developing roadway safety programs. The analysis was able to identify a pattern of high-risk locations within New York City. By studying historical collision data and deploying many statistical models to provide accuracy for the number of injured, killed, and contributing factors; by using the following methods Bayes Network, Naïve Bayes, J48, and KNN for the year of 2019 and the 100 intersections with high-risk for collisions derived from the NYPD Collision data using the 2014-2019 dataset. The result of these statistical model Performance classifier showed Naïve Bayes with the highest of 81.59%. Followed by the Bayes Network with a total accuracy of 81.59% and with J48 resulted in an accuracy of 80.81%. KNN performed the lowest with an accuracy of 80.20%.
Additionally, the 100 intersections with the high-risk for collisions were analyzed based on the contributing factors; using the same statistical classifier, Naïve Bayes had the best accuracy level of 59%. The Bayes Network performed at 55% accuracy, followed by J48 of 57%, and KNN with the lowest accuracy level was 45% accuracy. The 100 intersections did not perform as well as the 2019 dataset using the collision's severity.
Furthermore, the study performed a geospatial analysis identifying high-risk locations by using the 100 intersections. The geospatial analysis result was able to identify the following zip code as a problematic area for a diverse type of collisions, as shown in Table 9 below: