Big Brother is Watching You... To Predict Crashes


The age of Big Data is here and many industries have already started embracing it. The transportation industry stands much to gain from large-scale data analysis due to the complexity and pervasiveness of transportation in daily life, which promises smarter roads and a better understanding of our transportation needs and environment. But this inertia is also one of the greatest challenges to big data adoption initiatives. Transitionary technologies may, however, provide the answer to kick-start this migration today. This paper presents, in detail, a practical framework for implementation of an automated, high-resolution, video-based traffic-analysis system, particularly geared towards traffic flow modelling, behavioural studies, and road safety analysis. This system collects large amounts of microscopic traffic flow data from ordinary video cameras and provides the tools for studying basic traffic flow measures as well as more advanced, pro-active safety measures. This paper demonstrates the process step-by-step illustrated with examples and applies it to a case study of a large set of roundabout data. In addition to providing a rich set of behavioural data, the analysis suggests a relationship between flow ratio and safety, between lane arrangement and safety, and is inconclusive about the relationship between approach distance and safety.


Affordable computing and flexible and inexpensive sensor technology are transforming the current practice and methods for traffic data collection, monitoring and analysis: big data is changing how we interact with our environment and approach problem solving tasks in the field of transportation. This should come to no surprise as the complexity of urban mobility complexity and the pervasiveness of geo-location devices in daily life lend themselves naturally to large data sets. In this context, the use of mobile and/or fixed video sensors for traffic monitoring and data collection is becoming a common practice not only for freeways but also for urban streets. This opens up possibilities for more dynamic traffic load balancing and congestion easing of road networks and in return provides researchers with participatory network usage data collection. This new situation in which traffic data is being collected intensively demands more intelligent and advanced methods for traffic data analysis; it is then not surprising that computer vision techniques have gained popularity given the potential of transforming the existing video-based traffic monitoring infrastructure into a highly detailed traffic data collection tool to identify and study traffic behaviours.

One such behavioural study application is in proactive road safety diagnosis. This has been a long-standing goal in the field of transportation safety. Traditional statistical methods applied to accident data require long observation periods (years of crash data): one must wait for (enough) accidents to occur. Beginning in the 1960s, attempts were made to predict collision rates based on observations without a collision rather than historical accident records (Perkins 1968): these methods are now termed surrogate safety methods. The traffic conflict technique was one of the earliest methods proposed which entailed the observation of quasi-collision events: situations in which road users were exposed to some recognizable risk (probability) of collision, e.g. a near-miss. However, several problems limited their adoption: the manual data collection method is costly and may not be reliable, and the definition and objective measurement of these events are difficult (Chin 1997).

Today, with technological improvements in computing power, data storage, sensor technologies, and advances in artificial intelligence, these issues are quickly being addressed. This research presents the application of a video-based automated trajectory analysis solution which combines the latest advances in high-resolution traffic data acquisition (Saunier 2010) and machine learning methods to model and predict collision potential from relatively short, but extremely rich traffic data. This data is typically obtained from ordinary video data via computer vision from a camera situated at 10 m or more above the roadway (Jackson 2013). This trajectory data consists of position and velocity measurements of road users captured 15 to 30 times per second to a relatively high degree of accuracy. This amounts to several million individual instantaneous measurements over the period of one day at a typical site.

This high-resolution data permits the measurement of precisely defined instantaneous surrogate safety measures identifying collision probability. One such measure is time-to-collision (TTC) which measures the time remaining at any given instant to some collision point in the future defined by a collision course with another road user. This measure is useful as it provides the remaining time road users have to react to and avoid potential collisions. Higher TTCs are generally considered safer, though the precise link has yet to be validated. However, this measure relies on motion prediction hypotheses to identify collision courses. The traditional approach is to use constant velocity projection (Amundsen 1977) (Laureshyn 2010) (situations in which road users fail to correct their course or even turn), which is the motion prediction method most frequently used, without any justification. This approach does not natively provide a collision course probability, and it will not be suitable in situations where observed trajectories do no include constant velocity displacements: for example, turning lanes in an intersection and movements in a roundabout.

More advanced collision course modelling efforts are being developed, including motion patterns which represent naturalistic (expected) driving behaviour learnt from the same data set. This procedure provides several potential collision points and their probability as a function of both the characteristics of the specific site and the road users’ behaviour. The motion patterns, or the distribution of trajectories at a site and their probabilities, may be described discretely over time and space (St-Aubin 2014) or with prototype trajectories (Saunier 2007). The motion and collision predictions are computationally intensive as they explore, for each pair of road users, at each point in time, all future positions in time and space (typically subject to a time horizon). Furthermore, interaction complexity and exposure increase exponentially as the number of simultaneous road users in a scene increases. For example, over the course of one day, a typical intersection can experience between 100 thousands and 100 millions of these instantaneous interactions, depending on the intersection complexity.

This paper presents a complete automated system for proactive road safety analysis that can deal with large amounts of video data. To the authors’ knowledge, the presented system is the most comprehensive to be applied to such big data collected in the field for a real world traffic engineering study. A large video dataset was collected at more than 20 roundabouts in Quebéc to study road user behaviour and their safety. Camera views record data at more than 40 roundabout weaving zones, an area within the roundabout delimited by an entry and the next following exit. Each camera records 12 to 16 h of video on a given work day, which constitutes a dataset of over 600 hours of video data. Applying the proposed method to this large dataset yields considerable amounts of indicators, from individual road user measurements, e.g. speed, to individual interaction measurements, e.g. TTC, to aggregated indicators per road user or interaction, to aggregated indicators per site over time and space.

Analysing such big data is a challenge of a magnitude that has never been undertaken before in driver behaviour and road safety research. It holds the key to understanding the processes that lead road users to collide, and to design and validate safety indicators that do not require accidents to occur. The approach will be demonstrated on this video dataset to identify roundabout characteristics that influence road safety.

The paper is organized as follows: the next section presents the methodology, with practical examples drawn from the roundabout dataset, which is then applied to about half of the collected data and various system outputs are presented, before the conclusion and discussion of future work.