# Large-Scale Automated Proactive Road Safety Analysis Using Video Data

Abstract

Due to the complexity and pervasiveness of transportation in daily life, the use and combination of larger data sets and data streams promises smarter roads and a better understanding of our transportation needs and environment. For this purpose, ITS systems are steadily being rolled out, providing a wealth of information, and transitionary technologies, such as computer vision applied to low-cost surveillance or consumer cameras, are already leading the way.

This paper presents, in detail, a practical framework for implementation of an automated, high-resolution, video-based traffic-analysis system, particularly geared towards researchers for behavioural studies and road safety analysis, or practitioners for traffic flow model validation. This system collects large amounts of microscopic traffic flow data from ordinary traffic using CCTV and consumer-grade video cameras and provides the tools for conducting basic traffic flow analyses as well as more advanced, pro-active safety and behaviour studies. This paper demonstrates the process step-by-step, illustrated with examples, and applies the methodology to a case study of a large and detailed study of roundabouts (nearly 80,000 motor vehicles tracked up to 30 times per second driving through a roundabout).

In addition to providing a rich set of behavioural data about Time-to-Collision and gap times at nearly 40 roundabout weaving zones, some data validation is performed using the standard Measure of Tracking Accuracy with results in the 85-95% range.

# Introduction

Affordable computing and flexible and inexpensive sensor technology are transforming current practice and methods for traffic data collection, monitoring, and analysis: big data is changing how we interact with our environment and how we approach problem solving tasks in the field of transportation. This should come as no surprise as the complexity and pervasiveness in daily life of urban mobility lends itself naturally to large amounts of data. In this context, the use of mobile and/or fixed video sensors for traffic monitoring and data collection is becoming a common practice not only for freeways but also for urban streets. Early and notable examples include the NGSIM project which included a dataset of extracted trajectories from video data of four corridors (freeways and urban arterials) (Kim 2005) and the SAVEME project which fielded a small but early implementation of video tracking for surrogate safety analysis (Ervin 2000, Gordon 2012). The availability of such large data sets opens up possibilities for more dynamic traffic load balancing and congestion easing of road networks and in return provides researchers with participatory network usage data collection. This new situation in which traffic data is being collected intensively demands more intelligent and automated methods for traffic data analysis; it is then not surprising that computer vision techniques have gained popularity given their potential for transforming the existing CCTV infrastructure (or inexpensive consumer-grade video sensors (Jackson 2013)) into a highly detailed traffic data collection tool to identify and study traffic behaviours.

One of the most prominent behavioural study application is in proactive road safety diagnosis using surrogate safety methods. This has been a long-standing goal in the field of transportation safety, as traditional statistical methods using accident data require long observation periods (years of crash data): one must wait for (enough) accidents to occur in this time. Beginning in the 1960s, attempts were made to predict the number of collisions based on observations without a collision rather than historical accident records (Perkins 1968). The Traffic Conflict Technique (TCT) (Hydén 1984, Parker 1989) was one of the earliest methods proposed which entailed the observation of qualitatively-defined quasi-collision events: situations in which road users were exposed to some recognizable risk (probability) of collision, e.g. a near-miss. However, several problems limited their adoption: manual data collection is costly and may not be reliable, and the definition and objective measurement of these events were lacking (Hauer 1978, Williams 1981, Kruysse 1991, Chin 1997).

Today, with technological improvements in computing power, data storage, ubiquitous sensor technologies, and advances in artificial intelligence, these issues are rapidly being addressed. This research presents the application of a video-based automated trajectory analysis solution which combines the latest advances in high-resolution traffic data acquisition (Saunier 2010) and machine learning methods to model and predict collision potential (Mohamed 2013, St-Aubin 2014) from relatively short, but extremely rich traffic data. This data is typically obtained from ordinary video data via computer vision from a camera situated at 10 metres or more above the road surface (Jackson 2013), although some of the early footage was taken at 5 metres above the road surface. This trajectory data consists of position and velocity measurements of road users captured 15 to 30 times per second to a relatively high degree of accuracy. This amounts to several million individual instantaneous measurements over a period of one day at a typical site (for each camera).

One of the limitations of past studies involving surrogate analysis is the use of few sites or small datasets. This paper presents, step-by-step, a complete automated system for proactive road safety analysis using large amounts of video data. To the authors’ knowledge, the presented system is the most comprehensive to be applied to such a large amount of data collected in the field for a real world traffic engineering study. A large video data set was collected at nearly 40 roundabout weaving zones in Quebéc across 20 different roundabouts for the specific purpose of studying road user behaviour and corresponding safety using surrogate safety analysis. Roundabout weaving zones are defined as the area within the roundabout delimited by an approach and the next following exit. Each camera recorded 12 to 16 hours of video on a typical workday, constituting a dataset of over 470 hours of video data. Applying the proposed method to this large dataset yielded a considerable number of indicators, from individual road user measurements, e.g. speed, to individual interaction measurements, e.g. time to collision (TTC), to aggregated indicators per road user or interaction, to aggregated indicators per site over time and space.

This paper is organized as follows: the next section outlines the methodology, briefly reviews surrogate safety theory, and is followed by a step-by-step review of the methodology with practical examples drawn from the roundabout dataset; the results section provides early video data calibration results, safety indicator aggregation and prediction comparisons, and initial results of the complete roundabout study.

# Methodology

## Overview

Figure \ref{fig:1} outlines the general data collection and analysis methodology. Video data is collected for the study: for a cross-sectional study, at a number of sites with adequate representation of contributing factors and controlled external factors; for a before-after study, at one or more sites before and after a change in contributing factors. With scene data and camera calibration parameters, feature tracking is performed to extract road user trajectories. The trajectories are in the form of series of positions of moving objects over time within the scene. This positional data is then processed to obtain derived measures such as speed, heading, and acceleration. Finally, data is analysed and interpreted in a variety of ways: i) simple summaries such as average speed and volume counts; ii) generalized spatial relationship analysis, such as surrogate safety analysis; or iii) high-level interpretation of behaviour relative to elements of the scene (typically specific to the study) such as gap times and motor vehicle infractions, etc. With a large number of potential contributing factors, it may be beneficial to apply site clustering techniques before initiating behavioural measure correlation.

\label{fig:1} Data flow diagram showing the overview of the system.

## Video Data Processing

### Trajectories: Positions in Space and Time

Road user trajectories are extracted from video data using a feature-based tracking algorithm described in (Saunier 2006). Trajectories are a series of points in Cartesian space representing the centre position of a moving object (road user) at time $$t$$ on a planar surface. Points are evenly spaced in time with a consistent $$\Delta t$$ equivalent to the inverse of the frame rate of the video (typically 15 to 30 frames per second), i.e. a measurement is done for each frame. The object (road user) itself is represented by a group of characteristic features spread over the object and moving in unison. A sample of tracked road user trajectories is presented in image space in Figure \ref{fig:conflict-video}. The computer vision software is covered in greater detail in section \ref{software}.

Three potential sources of error exist: parallax, pixel resolution, and tracking:

• Parallax error is mitigated by maximizing the subtending angle between the camera and the height of tracked objects. In practical terms this requires a high angle of view, ideally a bird’s eye view, and tracking objects with a small height-to-base ratio. Passenger cars are more forgiving in this respect than heavy vehicles or pedestrians.

• Pixel resolution determines measurement precision. Objects further away from the camera experience lower tracking precision than objects near the camera. Error due to pixel resolution is mitigated by placing the camera as close to the study area as possible and using high-resolution cameras, though increases in resolution offer diminishing returns in tracking accuracy.

• Finally, tracking errors may occur with scene visibility issues or due to limits to current computer vision techniques. These erroneous observations have to be rejected or reviewed manually. Attempts have been made in recent literature and for this work to measure and optimize tracking accuracy (Ettehadieh 2015) using the Measure Of Tracking Accuracy (MOTA) methodology (Bernardin 2008). MOTA is a measure of accuracy that combines multiple sources of tracking error simultaneously, including false positive and false negative detection (Measure of Tracking Precision, MOTP, may be used for the spatial accuracy of measurements). The tracking optimization is replicated for this study using a genetic algorithm that searches heuristically for optimal tracking parameters based on MOTA as a measure of fitness from manually annotated video data. See section \ref{tracking_calibration} for more details and results of this process.

Depending on the steps taken to minimize tracking errors, feature-based tracking functions best over study areas of up to 50-100 m in length with high-to-medium speed, low-to-medium density traffic flows.

### Derived Data: Velocity & Acceleration

Velocity and acceleration vectors are derived by simple differentiation of position and subsequently velocity over time, after undergoing smoothing with a moving average window over several frames. The heading of the velocity vector may be used to determine the orientation of the vehicle.

### Size of Data

\label{method-size_of_data}

Feature tracking provides a microscopic level of detail. Individual observations measured at a single site over the course of a normal day typically register in the tens of millions. The sample size (number) of individual tracking measurements (positions, velocities, etc.) per hour $$n$$ can be estimated with the equation

$\label{eqn:data-size} n = fQd$

where $$f$$ is the number of frames per second of the video, $$Q$$ is the average hourly flow-rate, and $$d$$ is the average dwell-time of each vehicle in the scene (excluding full stops). Dwell-time is affected by the size of the analysis area in the scene and the average speed. As such, the size of the analysis area needs to be carefully selected.

\label{fig:conflict-video} Vehicle #304 is shown approaching vehicle #303 which is engaging the roundabout in the wrong direction, demonstrating a frequent violation leading to a traffic conflict.

## Complementary Data

With the exception of speed and aggregated traffic volume counts, vehicle trajectories offer little insight without context. Complementary data about the scene is collected and added in order to perform further analysis. This data includes a wide variety of design geometry and environment attributes characterizing the factors under study.

### Analysis Area

The analysis area is a bounding polygon which confines analysis to a particular region of the scene. This serves to i) reject regions of the image with unsatisfactory feature tracking (particularly at the edges of camera space), and ii) confine analysis to a particular region. An example of the analysis area is demonstrated in Figure \ref{fig:complex-network}.

### Alignments

\label{alignments}

Trajectory clustering is the first step in behavioural analysis. Trajectory clustering is an abstract representation of movements along prototypical paths through a scene, called alignments. This is the foundation for relating spatial position of trajectories with elements of road geometry and, in particular, the position of moving objects in relation to traffic lanes, bike paths, and side walks.

Many approaches to trajectory clustering have been explored. While some methods are supervised (Schreck 2008), many more are unsupervised (e.g. k-means (MacQueen 1967) or hidden Markov models (Rodríguez-Serrano 2012)). Manual trajectory clustering is labour intensive and may be a source of bias, but it allows for tight control of scene description and analysis oversight. Unsupervised clustering is systematic but naive, as this form of clustering can only make use of trajectory data to infer spatial relationship. Manual clustering along a series of splines, called alignments, is chosen for its simple implementation and tight control over interpretation. A hybrid approach, which automatically refines the spatial positioning of manually defined alignments through traditional unsupervised clustering approaches, is considered for future improvements (Morris 2008, Schreck 2008).

The alignment is represented as a simple series of points with a beginning and an end, in the same direction of travel as the majority of movements along this path. This process introduces a new coordinate system which maps a position of a moving object in Cartesian space to a position in curvilinear space:

$\label{eqn:coordinate-transform} (x,y)\to(l,s,\gamma).$

where a point located at $$(x,y)$$ in Cartesian space is snapped orthogonally to the nearest position on the nearest alignment $$l$$, and is represented by the curvilinear distance $$s$$ along this alignment from its beginning and the offset $$\gamma$$ (positive to the right of the vector), orthogonal to this alignment, measuring the distance between the original point and its position snapped to the alignment. A second pass may be performed over a moving window in less than the time users take to perform real lane changes to correct any localized lane “jumping” errors which frequently appear near converging or diverging alignments. These coordinates are useful for studying following behaviour, lane changes, and lane deflection.

### Network Topology

Once trajectories are clustered, a network topology is constructed in order to be able to intelligently propagate future possible positions of moving objects through the network. In simple networks (i.e. two alignments), these movements are implicitly defined simply by observing lane change ratios, but in more complex networks, such as the network shown in Figure \ref{fig:complex-network}, movements may involve multiple lane changes and therefore may require a more generalized approach. A recursive tree model is employed.

Alignment extremities are linked to other nearby alignments, creating diverging or converging branches, as are momentarily adjacent alignments. In addition, alignments which run parallel over a distance of more than 15 metres are instead grouped into corridors over which lane changes may occur freely. This creates a series of links and nodes with implicit direction which can be searched to determine all possible future positions of a moving object inside this network. This serves to reduce processing times of spatial relationship calculations between objects (triage) and provides more intelligent interpretation of spatial relationships.

### Geometric data/inventory

Finally, a traditional inventory of contextual factors that may be related to the behaviours under study needs to be constructed and associated with each site. These typically include:

• Geometric characteristics including lane width, curvature, approach angles, and number of lanes, presence of slip lanes and/or lane configuration;

• Presence of horizontal and vertical signalization: if proper placement is under evaluation, location may be recorded, or a typology of signage quality may be used instead;

• Pedestrian facilities such as crosswalks, pedestrian refuges, etc.;

• Built environment: school zones, land usage, clearance;

• Upstream/downstream distances to features such as intersections, speed limit changes, rail-road crossings, etc.

\label{fig:complex-network} The partial trajectories and scene of a multi-lane roundabout with a complex configuration of lanes (the south and east approaches are not visible). The analysis area is the grey area inside the black polygon, the alignments are the pink lines, while the connectors are in cyan. Some sample trajectories are highlighted in light grey.

## Measurement Definitions

### General versus Specific Analysis Measures

Some measures are generalizable for all traffic studies using alignments, while others are not (Gettman 2003). General traffic measures include speed profiles, counts, lane changes, origin-destination matrices, and basic spatial relationships including conflicts. Other measures may be specific to the study and generally require high-level interpretation (HLI). This interpretation makes use of study-specific geometric information to generate custom measures. As such, this section will not cover these custom measures, but will instead focus on generalizable measures. However, an application of HLI calculations will be briefly presented in section \ref{sample_HLI_analysis}.

### Interactions

An interaction quantifies the spatial relationship between moving objects in a scene, as is depicted in Figure \ref{fig:conflict-video}. At the most fundamental level, an interaction is defined as a pair of moving objects simultaneously present in a scene over a common time interval (also referred to as a user pair). We further define an instantaneous observation (i.e. in a given video frame) within this time interval as an interaction instant (St-Aubin 2015).

This interaction definition is generic, if not naive, as the quality depends largely on how the scene is constructed. For example, the significance of an interaction between two vehicles separated from each other physically (e.g. via a median or a building) may not be comparable to an interaction between two vehicles merely separated by lane markings because of the implication that they may cross the lane marking intentionally or inadvertently very easily. This may interfere with collision prediction attempts, particularly if scenes are not consistently selected and geometry is not controlled.

One solution is to perform a triage of user pairs based on physical access and proximity. A network topology coupled with a driving distance horizon is proposed. This is not a perfect solution, however, as physical access may not necessarily be a discrete choice. For the example of the median, it is still physically possible, although much less likely, for a vehicle to cross over into oncoming traffic.

### Motion Prediction

\label{motion-prediction}

While vehicle trajectories offer a rich set of observed behavioural data, they do not provide much collision data. The proactive road safety approach requires that predicting collisions should be performed without observing them directly. To be studied, collisions must be extrapolated from traffic events with potential for collision. This potential is modelled by predicting future possible positions between each pair of road users at every instant in time. Several motion prediction models are proposed for study, including constant velocity (Laureshyn 2010), normal adaptation (Mohamed 2013), and motion patterns (Saunier 2007, Morris 2008, Sivaraman 2013), particularly the implementation of a discretized motion pattern (St-Aubin 2014). Specific details of implementation of each method can be found in (St-Aubin 2015).

As illustrated in Figure \ref{fig:prob-collision-space}, motion prediction is performed for each user pair over each instant $$t_0$$ for a number of time steps of size $$\Delta t$$ between $$t_0$$ and some chosen time horizon. Each motion prediction may generate for two road users a series or a matrix of collision points with a sum of probabilities inferior or equal to 1.

\label{fig:prob-collision-space} Probability of collision in motion prediction space in $$(x,y,t)$$ projected into the future from $$t = t_0$$ over $$\Delta t$$ steps where TTC is the measured $$\Delta t$$ in projected motion prediction space where and when a significant probability of collision occurs.

### Time-to-collision

Time-to-collision (TTC), first proposed by (Hayward 1971), is one of the most popular surrogate safety measures. It is a method of quantifying proximity to danger. Time-to-collision measures the time, at a given instant $$t_0$$, until two road users on a collision course collide, based on the motion prediction model. In its simplest form, e.g. motion prediction at constant velocity or in a car-following situation, time-to-collision is the ratio of differential velocity or speed and differential position. A TTC value of 0 seconds is, by definition, a collision. TTC is particularly useful as it has the same dimensions as some important traffic accident factors such as user perception and reaction time, typically set at the critical value of 1.5 seconds (Hydén 1987, Green 2000) in the literature, and breaking time. Larger values of observed TTC thus provide greater factors of safety for particular driving tasks.

Time-to-collision is measured instantaneously: a new value of TTC may be computed at every instant. Thus, a pair of users may have a time series of TTC observations evolving over time. Some efforts have been made to study these evolutions (Mohamed 2013) as a form of safety continuum (Zheng 2014). Other approaches have focused counting discrete events using a threshold similar to the TCTs (Hydén 1987, Svensson 2006), although care must be taken because results can be shown to vary with prediction methodology and threshold level (St-Aubin 2015).

A sample pair of road user trajectories (#303 and #304, Figure \ref{fig:conflict-video}) and spatial relationships simultaneously existing over a time interval lasting 64 instants, or just over 4 seconds, is presented in Figure \ref{fig:conflict-series}. In this scenario, vehicle #304 is approaching vehicle #303, which is engaged in an illegal U-turn, at high velocity. The norm of the differential velocity $$\Delta v$$, relative distance $$d$$, and corresponding time $$t$$ are measured for every instant. In a matter of just under 4 seconds, the differential velocity changes from 9.63 to 2.26 m/s while the relative distance changes from 28.57 to 9.57 m. For every interaction instant of this user pair, motion prediction is used to calculate resulting TTC under each motion prediction method. The time series of the predicted collisions and associated TTC measures for this pair of users and for each different motion prediction method is presented in Figure \ref{fig:ttc-timeseries}. At each instant in time, normal adaptation and motion pattern prediction generate multiple possible collisions, while constant velocity can only ever produce one possible point of collision. The existence of more predicted collision instants using motion patterns demonstrates the potential for predicting many more complex situations. When several potential collision points are predicted, the expected time-to-collision, defined as $$TTC'_i$$, at time $$t_i$$ (for instant $$i$$) is calculated as the probability-weighted TTC average of all possible collision points indexed $$j=1..n$$ with $$Prob(collision)_{ij}$$ as defined and measured in (St-Aubin 2014).

$\label{eqn:TTC-mp-weight-avg} TTC'_i = \frac{\sum_{j=1}^{n}{TTC_{ij} Prob(collision)_{ij}} }{n}$

It is clear from both this figure and the trajectories themselves that constant velocity and normal adaptation motion predictions are inadequate for roundabout conflict analysis: the trajectories share the same destination yet their headings lead them to a potential collision point only for a brief period of time with these prediction methods.

\label{fig:conflict-series} Vehicle #304 is shown approaching vehicle #303 which is engaging the roundabout in the wrong direction. Spatial relationship measures $$\Delta V$$, relative distance $$d$$, and time stamp $$t$$ are labelled along the time series every eight frames between the two trajectories. Light grey lines join the two trajectories at common time frames for visualization purposes.

\label{fig:ttc-timeseries} Time series of TTC observations for different motion prediction methods for the interaction between vehicles #303 and #304. Markers correspond to TTC for a specific collision point at each instant, the probability of collision associated with each collision point is indicated by the opacity of the marker and lines are weighted average TTC observations per instant.

## Indicator aggregation over time and space

Instantaneous surrogate safety indicators such as speed and TTC are observed continuously. Currently, there is a lack of consensus in the literature on how to interpret continuous measures in terms of safety besides the event-based (Hydén 1987, Svensson 2006) or safety continuum (Zheng 2014) paradigms. One qualitative approach used in the literature has been to compare shifts in the probability distributions of these indicators when the magnitude of impact of individual indicators has no bearing on the overall direction of shift of safety risk (Ismail 2010, Autey 2012, St-Aubin 2013). For example, as illustrated in Figure \ref{fig:distro-comparison} c) and d) it’s unclear if converting “high and low risk” events into “medium risk” events is beneficial, but converting “high risk” events into “low risk” events is always beneficial as illustrated in a) and b). Still, there may be more complex interpretations related to driver awareness and learning as a function of perceived danger as described in (Svensson 2006). Alternatively, changes in frequency of events meeting a given threshold may yield similar results. While traffic event conversion factors have been developed and used in the past (Hydén 1987, Svensson 2006), transferability of validated conversion factors has been cited as problematic (Mohamed 2013). Indeed, while the shape of probability distributions of TTC has generally been characterized as Gamma-like in the literature (Ismail 2010, Autey 2012), variations or compound effects can be found, particularly when traffic streams become mixed (St-Aubin 2013). One recent approach proposed a shifted gamma-generalized Pareto distribution model (Zheng 2014).

Indicator interpretation may depend on the precise definition of what constitutes a “traffic interaction”. The disaggregated interaction approach treats indicators as a representation of incremental risk which add up over time, but it tends to bias slower moving objects which dwell in the scene longer and complicates conditional probability calculations. The user-pair aggregated approach solves these problems but is sensitive to how the analysis area is defined and which users pairs are defined as interacting. Overall, a more detailed interaction exposure framework is still needed in the literature. Some simplifications have been made in the meantime. The traditional TCT methodology frequently represented user pair interactions using a single value of TTC, the minimum TTC ($$TTC_{min}$$). In practice, due to the imperfect nature of automated video data extraction, minimum values tend to oversample instantaneous tracking errors, and so a 15th percentile approach is preferred (St-Aubin 2015).

\label{fig:distro-comparison} TTC probability distribution function comparisons. Qualitative analysis of TTC shift is possible in a), while it is not as straightforward in b). An alternative approach is to perform a comparison on a low threshold.

## Tools & Techniques

### Data Collection

Video data collection is primarily performed using a specially constructed mobile video-data collection system built for temporary high-angle video data collection and has tamper-proof, weather-proof, self-contained features presented in (Jackson 2013). Ordinary fixed traffic surveillance CCTV cameras can be used as well and are usually more reliable, but only where and when available.

### Camera types

Two cameras were tested. The first was a Vivotek IP security camera with a narrow lens filming at 15 frames per second at a resolution of $$800\times 600$$, while the second camera was a GoPro 2 with a wide-angle lens filming at 30 frames per second at a resolution of $$1280\times 960$$. Both cameras encoded video in H.264, although the GoPro used a significantly larger bitrate. The consumer-grade GoPro provides higher-quality videos, however these gains seem to have minimal impact on vehicle tracking performance while the video files are much larger and more difficult to handle and process.

### Software

\label{software} The software used is the open-source Traffic Intelligence project (Saunier 2010, Jackson 2013), itself based on the computer vision library OpenCV (Brahmbhatt 2013). This software provides the basic feature tracking (the algorithm presented in (Saunier 2006)), trajectory management and coordinate projection functionality, as well as a few usage-specific tools such as correction for lens distortion, trajectory clustering, and basic motion prediction functions. Some of the more advanced analysis tools and techniques presented in this paper are under development and will be made available as their functionality is completed and validated.

### Data Model

All this data, from the raw video data to the high level interpretation, need to be organized and managed. A high level conceptual data model is presented as an entity-association diagram in Figure \ref{fig:data-model}. It is a little simplified to avoid some implementation details which can be found in the Traffic Intelligence project. The diagram has two main parts:

• the entities (objects) resulting from the video analysis (upper half of Figure \ref{fig:data-model}): the raw data extracted from video analysis is stored as time series (indexed by frame number) of positions and velocities (with corresponding unique keys), and are grouped as road user objects which may have a type (passenger vehicle, truck, cyclist, etc.). Interactions are between two road users and may be characterized by several indicators such as TTC, PET, etc.

• the entities providing the data description or meta data (lower half of Figure \ref{fig:data-model}): sites, e.g. the different roundabouts studied in this work, are the corner stone of the meta-data. They may correspond to several video sequences (the actual video files), each being characterized by a camera view with camera parameters such as a homography matrix (but a camera view can be the same for several files, e.g. when video sequences are split into several files for each hour). Various types of site features (complementary data) may be added, e.g. the alignments and analysis areas shown as examples in the diagram.

Positions and road users are obviously linked to a camera view, a video sequence, and a site (not through an actual key in the positions table as shown in Figure \ref{fig:data-model}, but through configuration files).

### Processing

Real-time analysis is not an explicit goal of this technology, as its intended use is primarily the off-line analysis of data recorded in the field from non-permanent video cameras. However, performance is a serious consideration, if for no other reason than to ensure that processing remains affordable and does not fall behind data collection. In any case, some calculations may require pre-processing of as much data as possible, in particular machine learning tasks such as motion prediction (see section \ref{motion-prediction}).

In the current iteration of the software, and with today’s multi-core possessors, tasks are highly parallelizable. Feature tracking and trajectory analysis can be performed on multiple video sequences at a time, typically cut up into 20-minute or one-hour segments, in parallel on a single mid-to-high-performance machine, or on a computer cluster. With parallel processing of video sequences on a single computer, memory becomes the main bottleneck; 32 GB or more of memory are highly recommended on a multi-core machine, to take full advantage of up to 8 threads. Alternatively, the large majority of calculation tasks can be parallelized at the observation level, as they are independent events.

Feature tracking is written in C++ for performance, while the majority of trajectory analysis is written in Python for ease of development and extensibility. Where possible, expensive trajectory analysis calculations make use of Python wrappers for fast compiled libraries.

\label{fig:data-model} High level data model as an entity-association diagram (using symbols of the Unified Modelling Language)

# Experimental Results

## Tracking Performance

\label{tracking_calibration}

Tracking accuracy was evaluated by way of the CLEAR MOT metric Measure Of Tracking Accuracy (MOTA) (Bernardin 2008) using as ground truth manually annotated trajectories with an open source annotation tool from the Urban Tracker project (Jodoin 2014). Perfect tracking would yield MOTA of 100%, while it can become negative if more false alarms are done than there are ground truth objects. Nearly 40,000 observations (instants) of 371 motor vehicles were annotated manually across two prototypical sample sites: one site where objects were tracked at a distance of between 20 and 50 meters away from the camera and one site where objects were tracked at a distance no greater than 20 meters away. An initial MOTA was calculated for default tracking parameters and offered modest performance (in the neighbourhood of $$70\%$$). A tracking optimization was then performed using a genetic algorithm similarly to (Ettehadieh 2015) to search for parameters that maximized MOTA. After optimization, accuracy increased to $$94\%$$ (measured over the same ground truth used for optimization). These statistics are summarized in Table \ref{tab:track_calib_data}. Tracking parameter optimization converged in less than 24 hours.

Of important note is the fact that this calibration was performed on the oldest and poorest quality video data (resolution of $$800\times 600$$, without software-assisted image stabilization, and before lens correction). Tracking results are expected to improve with increases in quality of video data and video pre-treatment. The optimized tracking parameters should be portable to other sites with similar view and camera characteristics, though this will be investigated further and be the subject of a future paper in depth.

\label{tab:track_calib_data}

XXXXX Site & Ground Truth Objects & Ground Truth Observations & Unoptimized MOTA (Default Parameters) & Optimized MOTA
#1-1-12 (far camera) & 146 & 12887 & 0.685 & 0.944
#22-3-16 (near camera) & 225 & 25011 & 0.744 & 0.853

## Data Size

In total, 473 hours of video data were collected (see Table \ref{tab:data_details} for details). Data was typically collected for 12 hours from 7 a.m. to 7 p.m.. At 30 frames per second, a data collection at an intersection over a driving distance of 50 metres and at an average driving speed of 30 km/h, and with an average hourly flowrate of 500 veh/h yields approximately 90,000 instantaneous moving object measurements per hour. Additionally, each of these observations can have anywhere between 3 and 100 feature tracks associated with it. For an adequate balance between performance and tracking accuracy, the recommended number of features to aim for is roughly 15-20 per object over time, depending on the typical time spent by a road user in the field of view. This yields manageable data sizes (roughly 500 MB of storage per hour of video) while maintaining an adequate level of data richness and object representation. Video storage needs will vary greatly by camera choice, resolution, frame rate, and video encoding settings.

\label{tab:data_details}

 Roundabouts 20 Analysis Areas 36 Hours of Video Data 473 Estimated Total Traffic Volume 79,432 Disk Space (Video + Data + Overhead) 1.9 TB Veh-km Traveled 9505 veh-km

Figure \ref{fig:flow-userpairs} shows hourly number of user pairs observed versus traffic volume. Trends are evident, but explaining factors are not clear (probably a mix of several lane arrangement indicators). This will need further study. The number of user pairs per hour should be linearly correlated with the number of interaction instants as is demonstrated in Figure \ref{fig:userpairs-interactions}. If they are not, it is possible that analysis areas across sites are not comparable, particularly for disaggregated TTCs. Most of the analysis is conducted on a pair of dedicated consumer-grade high-performance machines (INTEL Core i7 3770k processor with 16 to 32 GB of memory), with some work offloaded to a computing cluster when acceleration of work is necessary. Feature tracking performance depends on video resolution and special post-processing requirements such as stabilization or lens correction (for distortion). A typical one hour $$800\times 600$$ pixels video at 30 frames per second is processed with current consumer-grade hardware in about an hour. A typical one hour $$1280\times 960$$ video with correction for distortion can be processed in about two hours. Basic analysis on one of these trajectory sequences takes between 5 minutes and 30 minutes, depending on traffic in the scene, while interaction analysis, particularly using motion patterns, can typically take anywhere between 1 and 48 hours to complete. Interaction analysis processing times are very sensitive to the interaction complexity of the scene.

\label{fig:flow-userpairs} Number of user pairs observed at each roundabout versus inflow per lane per hour.

\label{fig:userpairs-interactions} Number of interaction instants observed versus number of user pairs observed.

## Sample Surrogate Safety Analysis

A sample surrogate safety analysis of three of the sites is demonstrated in Figure \ref{fig:selected-site}. This shows road user trajectories projected in and with respect to the scene, mean velocity vectors and speed along these trajectories, and spatial distribution of collision points based on motion pattern prediction with instantaneous $$probability > 1e5$$ and $$TTC < 1.5 seconds$$. Problematic weaving conflicts are highlighted in the first and third examples, particularly for multi-lane roundabouts. The second example, on the other hand, demonstrates car-following conflicts, particularly at the approach (not surprising, given that this single lane approach frequently experiences queuing as drivers yield to the high conflicting flow) and exit (more surprising).

\label{fig:selected-site} Sample spatial data and analysis at 3 selected sites from top to bottom. From left to right, diagrams demonstrate trajectory tracks (positions) in analysis area and with descriptive alignments, mean speed and heading, and spatial distribution of collision points based on motion pattern prediction with instantaneous $$probability > 1e5$$ and $$TTC < 1.5 seconds$$. All coordinates in metres, north pointing upwards.

\label{fig:TTC_CDF} Cumulative probability distribution of 15th percentile motion pattern TTCs for all roundabout weaving zones clustered into six groups.

Due to the large number of potentially contributing factors (at least 20) and the relatively small number of corresponding sites, sites were clustered into 6 groups according to Table \ref{tab:cluster_ttc_profile} using the k-means algorithm (St-Aubin 2015a).

\label{tab:cluster_ttc_profile}

lLll Cluster & Description & Group size & Observations
$$cl_1$$ & Small single and double lane residential collectors & 11 & 4,200
$$cl_2$$ & Single-lane regional highways and arterials with speed limits of 70-90 km/h and mostly polarized flow ratios & 16 & 26,243
$$cl_3$$ & 2-lane arterials with highly polarized flow ratios & 5 & 13,307
$$cl_4$$ & Hybrid lane 1->2 2->1 arterials with flow ratios favouring the conflicting flow & 3 & 4,809
$$cl_5$$ & Traffic circles converted to roundabouts (2 lanes, extremely large diameters, tangential approach angles) & 4 & 10,295
$$cl_6$$ & Single-lane regional highways with large-angle quadrants (140 degrees) and even flow ratios & 2 & 2,235

The cumulative distribution of 15th percentile motion pattern TTCs is broken down by roundabout cluster as illustrated in Figure \ref{fig:TTC_CDF}. Cluster $$cl_3$$ (2-lane arterials with highly polarized flow ratios) offers the best and clearest safety benefits, while cluster $$cl_5$$ (traffic circles converted to roundabouts with 2 lanes, extremely large diameters, tangential approach angles) offers markedly decreased safety performance. $$cl_6$$ (single-lane regional highways with large-angle quadrants (140 degrees) and even flow ratios) is ambiguous in its interpretation: while there is a general left-shift in TTC distribution (towards lower values) there is also a decrease in frequency of the lowest (most severe) TTC indicators. The remaining clusters, representing 31 sites, tend towards an average TTC distribution. This is still a large number of sites. Further investigation, regressing for specific factors, is warranted (St-Aubin 2015a).

## Sample High-Level Interpretation Analysis

\label{sample_HLI_analysis}

Some high-level interpretation measures are also compiled using the data sample. Figure \ref{fig:speed_profile} shows the mean speed profiles, with the interval at mean $$\pm$$ one standard deviation, through the roundabout weaving zones. Speed profiles are mapped, not as units of distance, but rather as proportions of curvilinear location relative to the start and end of the merging zone, measured at the yield line of the approach and equivalent line at the corresponding exit. This is done to account for the large variability in diameter of roundabouts and angular size of weaving zones across sites. Mean speeds are generally consistent with those in the literature, but variation does occur with relative location and movement type (Hydén 2000). In addition, cross-sectional analysis uncovers even larger variations in mean speed profiles (St-Aubin 2013).

Finally, Figure \ref{fig:hli_histos} presents boxplots of lag gap acceptance of approaching vehicles and conflicting vehicles. In a cross-sectional analysis, this quantifies vehicle insertion aggressivity. Smaller accepted gaps might be explained by more impatient drivers, typically symptomatic of high-volumes of continuous flows inside the roundabout and long wait times at the approach.

\label{fig:speed_profile} Mean speed profiles and $$\pm$$ one standard deviation relative to merging zone start and end according to movement type.

\label{fig:hli_histos} Distributions of gap at merging instant of approaching vehicles (critical gap), presented as boxplots (the maximum represented gap is 100 s, but some sites see even larger values).

# Conclusion

Large-scale automated video data allows for larger surrogate analysis, in practice and research, for the same purpose as the traditional historical accident data approach: road network screening, evaluation of countermeasure, road safety diagnosis, etc. With this in mind, surrogate safety analysis is expected to play an important role as a complementary safety approach or as an approach that can potentially replace the traditional methods, particularly when historical accident data is limited or doubtful (given its poor quality in some cases).

This paper demonstrates the theoretical and practical application of a large-scale automated video data collection system using computer vision for highly detailed traffic studies, in particular for proactive road safety analysis using surrogate safety analysis. The reader is led step-by-step through the process of collecting, processing, and analysing video data, with examples and discussion of challenges along the way. This paper demonstrates an early implementation of the methodology in the form of a cross-sectional analysis of driver behaviour in the largest set of roundabout video data analysed to date.

Several technical challenges and their solutions were outlined, notably tracking errors (MOTA optimized to $$94\%$$ and no less than $$85\%$$), analysis of TTC distributions, and aggregation and sampling considerations. It is expected that these issues will be further addressed as processing and analysis tools become more accessible, more collaborators contribute solutions to the open source software stack, and techniques applied to transportation issues become more sophisticated. Future work will examine camera lens, angle, and visibility considerations for effect on tracking accuracy.

The roundabout analysis concludes that 2-lane arterials with highly polarized flow ratios offer the best gains in safety, though this is likely due to the effect of polarized flows and not the multi-lane aspect, as it was also shown that multi-lane roundabouts generate increased concentration of conflict. The true effects of multi-lane roundabouts on safety may not be as simple as previously thought, and may depend on the specifics of lane configuration. Traffic circles converted to roundabouts (change of signage only) yielded the poorest results and should be avoided. A future paper will be dedicated to examining regression models for individual contributing factors, in more detail as well as before-and-after studies of winter driving and roundabout conversion.

# Acknowledgements

The authors would like to acknowledge the funding of the Québec road safety research program supported by the Fonds de recherche du Québec – Nature et technologies, the Ministère des Transports du Québec and the Fonds de recherche du Québec – Santé (proposal number 2012-SO-163493), as well as the various municipalities for their logistical support during data collection. Shaun Burns, a Masters student at McGill University was also instrumental in the collection of video data.

### References

1. Z. Kim, G. Gomes, R. Hranac, A. Skabardonis. A Machine Vision System for Generating Vehicle Trajectories over Extended Freeway Segments. In 12th World Congress on Intelligent Transportation Systems. (2005).

2. R. Ervin, C. MacAdam, J. Walker, S. Bogard, M. Hagan, A. Vayda, E. Anderson. System for Assessment of the Vehicle Motion Environment (SAVME). (2000).

3. Timothy Gordon, Zevi Bareket, Lidia Kostyniuk, Michelle Barnes, Michael Hagan, Zu Kim, Delphine Cody, Alexander Skabardonis, Alan Vayda. Site-Based Video System Design and Development. (2012).

4. Stewart Jackson, Luis F. Miranda-Moreno, Paul St-Aubin, Nicolas Saunier. Flexible, Mobile Video Camera System and Open Source Video Analysis Software for Road Safety and Behavioral Analysis. Transportation Research Record: Journal of the Transportation Research Board 2365, 90–98 Transportation Research Board, 2013. Link

5. S.R. Perkins, J.I. Harris. Traffic conflicts characteristics: Accident potential at intersections. Highway Research Record 225, 35-43 (1968).

6. C. Hydén, L. Linderholm. The Swedish Traffic-Conflicts Technique. 133-139 In International Calibration Study of Traffic Conflict Techniques. Springer Berlin Heidelberg, 1984. Link

7. M.R. Parker, C.V. Zegeer. Traffic Conflict Techniques for Safety and Operations - Observers Manual. 36 U.S.Department of Transportation, 1989.

8. E. Hauer. Traffic conflict surveys: some study design considerations. (1978).

9. M.J. Williams. Validity of the traffic conflicts technique. Accident Analysis & Prevention 13, 133-145 Elsevier BV, 1981. Link

10. Herman W. Kruysse. The subjective evaluation of traffic conflicts based on an internal concept of dangerousness. Accident Analysis & Prevention 23, 53-65 Elsevier BV, 1991. Link

11. Hoong-Chor Chin, Ser-Tong Quek. Measurement of traffic conflicts. Safety Science 26, 169–185 Elsevier BV, 1997. Link

12. Nicolas Saunier, Tarek Sayed, Karim Ismail. Large-Scale Automated Analysis of Vehicle Interactions and Collisions. Transportation Research Record: Journal of the Transportation Research Board 2147, 42–50 Transportation Research Board, 2010. Link

13. Mohamed Gomaa Mohamed, Nicolas Saunier. Motion Prediction Methods for Surrogate Safety Analysis. Transportation Research Record: Journal of the Transportation Research Board 2386, 168–178 Transportation Research Board, 2013. Link

14. Paul St-Aubin, Luis F. Miranda-Moreno, Nicolas Saunier. Road User Collision Prediction Using Motion Patterns Applied to Surrogate Safety Analysis,. In Transportation Research Board-XCII # STOC. (2014).

15. N. Saunier, T. Sayed. A feature-based tracking algorithm for vehicles in intersections. In Canadian Conference on Computer and Robot Vision. IEEE, 2006. Link

16. D. Ettehadieh, B. Farooq, N. Saunier. Systematic Parameter Optimization and Application of Automated Tracking in Pedestrian-Dominant Situations. In trb. (2015).

17. Keni Bernardin, Rainer Stiefelhagen. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. EURASIP Journal on Image and Video Processing 2008, 1–10 Springer Science $$\mathplus$$ Business Media, 2008. Link

18. Tobias Schreck, Jurgen Bernard, Tatiana Tekusova, Jorn Kohlhammer. Visual cluster analysis of trajectory data with interactive Kohonen Maps. In 2008 IEEE Symposium on Visual Analytics Science and Technology. IEEE, 2008. Link

19. J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. 1:281-297 In 5-th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, 1967.

20. José A. Rodríguez-Serrano, Sameer Singh. Trajectory clustering in CCTV traffic videos using probability product kernels with hidden Markov models. Pattern Analysis and Applications 15, 415–426 Springer Science $$\mathplus$$ Business Media, 2012. Link

21. B.T. Morris, M.M. Trivedi. A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance. Circuits and Systems for Video Technology, IEEE Transactions on 18, 1114-1127 (2008). Link

22. Douglas Gettman, Larry Head. Surrogate Safety Measures From Traffic Simulation Models. 118 (2003).

23. Paul St-Aubin, Nicolas Saunier, Luis F. Miranda-Moreno. Comparison of Various Time-to-Collision Prediction and Aggregation Methods for Surrogate Safety Analysis. In Transportation Research Board (TRB) 94th Annual Meeting. National Academy Of Sciences, 2015.

24. Aliaksei Laureshyn, Åse Svensson, Christer Hydén. Evaluation of traffic safety, based on micro-level behavioural data: Theoretical framework and first implementation. Accident Analysis and Prevention 42, 1637–1646 Elsevier BV, 2010. Link

25. N. Saunier, T. Sayed, C. Lim. Probabilistic Collision Prediction for Vision-Based Automated Road Safety Analysis. 872-878 In The 10th International IEEE Conference on Intelligent Transportation Systems. IEEE, 2007. Link

26. Sayanan Sivaraman, Mohan M. Trivedi. Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis. 14, 1773-1795 In 2013 Intelligent Vehicles Symposium (IV). IEEE, 2013.

27. Paul St-Aubin, Nicolas Saunier, Luis F. Miranda-Moreno, Karim Ismail. Use of Computer Vision Data for Detailed Driver Behavior Analysis and Trajectory Interpretation at Roundabouts. Transportation Research Record: Journal of the Transportation Research Board 2389, 65–77 Transportation Research Board, 2013. Link

28. J.C. Hayward. Near misses as ameasure of safety at urban intersections. (1971).

29. Christer Hydén. The Development of a Method for Traffic Safety Evaluation: The Swedish Traffic Conflicts Technique. (1987).

30. Marc Green. How Long Does It Take to Stop? Methodological Analysis of Driver Perception-Brake Times. Transportation Human Factors 2, 195-216 Informa UK Limited, 2000. Link

31. Lai Zheng, Karim Ismail, Xianghai Meng. Shifted Gamma-Generalized Pareto Distribution model to map the safety continuum and estimate crashes. Safety Science 64, 155–162 Elsevier BV, 2014. Link

32. Åse Svensson, Christer Hydén. Estimating the severity of safety related behaviour. Accident Analysis and Prevention 38, 379–385 Elsevier BV, 2006. Link

33. K. Ismail, T. Sayed, N. Saunier. Automated Analysis of Pedestrian-Vehicle Conflicts: Context For Before-and-after Studies. Transportation Research Record 2198, 52-64 (2010).

34. Jarvis Autey, Tarek Sayed, Mohamed H. Zaki. Safety evaluation of right-turn smart channels using automated traffic conflict analysis. Accident Analysis and Prevention 45, 120–130 Elsevier BV, 2012. Link

35. Samarth Brahmbhatt. Practical OpenCV. Apress, 2013. Link

36. Jean-Philippe Jodoin, Guillaume-Alexandre Bilodeau, Nicolas Saunier. Urban Tracker: Multiple object tracking in urban mixed traffic. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 2014. Link

37. Paul St-Aubin, Nicolas Saunier, Luis F. Miranda-Moreno. Large-Scale Microscopic Traffic Behaviour and Safety Analysis of Québec Roundabout Design. In Transportation Research Board (TRB) 94th Annual Meeting. National Academy Of Sciences, 2015.

38. Christer Hydén, András Várhelyi. The effects on safety time consumption and environment of large scale use of roundabouts in an urban area: a case study. Accident Analysis & Prevention 32, 11–23 Elsevier BV, 2000. Link