Authorea

Tools & Techniques

Data Collection

Video data collection is primarily performed using a specially constructed mobile video-data collection system built for temporary high-angle video data collection and has tamper-proof, weather-proof, self-contained features presented in \cite{Jackson_2013}. Ordinary fixed traffic surveillance CCTV cameras can be used as well and are usually more reliable, but only where and when available.

Camera types

Two cameras were tested. The first was a Vivotek IP security camera with a narrow lens filming at 15 frames per second at a resolution of \(800\times 600\), while the second camera was a GoPro 2 with a wide-angle lens filming at 30 frames per second at a resolution of \(1280\times 960\). Both cameras encoded video in H.264, although the GoPro used a significantly larger bitrate. The consumer-grade GoPro provides higher-quality videos, however these gains seem to have minimal impact on vehicle tracking performance while the video files are much larger and more difficult to handle and process.

Software

\label{software} The software used is the open-source Traffic Intelligence project \cite{Saunier_2010,Jackson_2013}, itself based on the computer vision library OpenCV \cite{Brahmbhatt_2013}. This software provides the basic feature tracking (the algorithm presented in \cite{saunier06feature-based}), trajectory management and coordinate projection functionality, as well as a few usage-specific tools such as correction for lens distortion, trajectory clustering, and basic motion prediction functions. Some of the more advanced analysis tools and techniques presented in this paper are under development and will be made available as their functionality is completed and validated.

Data Model

All this data, from the raw video data to the high level interpretation, need to be organized and managed. A high level conceptual data model is presented as an entity-association diagram in Figure \ref{fig:data-model}. It is a little simplified to avoid some implementation details which can be found in the Traffic Intelligence project. The diagram has two main parts:

the entities (objects) resulting from the video analysis (upper half of Figure \ref{fig:data-model}): the raw data extracted from video analysis is stored as time series (indexed by frame number) of positions and velocities (with corresponding unique keys), and are grouped as road user objects which may have a type (passenger vehicle, truck, cyclist, etc.). Interactions are between two road users and may be characterized by several indicators such as TTC, PET, etc.
the entities providing the data description or meta data (lower half of Figure \ref{fig:data-model}): sites, e.g. the different roundabouts studied in this work, are the corner stone of the meta-data. They may correspond to several video sequences (the actual video files), each being characterized by a camera view with camera parameters such as a homography matrix (but a camera view can be the same for several files, e.g. when video sequences are split into several files for each hour). Various types of site features (complementary data) may be added, e.g. the alignments and analysis areas shown as examples in the diagram.

Positions and road users are obviously linked to a camera view, a video sequence, and a site (not through an actual key in the positions table as shown in Figure \ref{fig:data-model}, but through configuration files).

Processing

Real-time analysis is not an explicit goal of this technology, as its intended use is primarily the off-line analysis of data recorded in the field from non-permanent video cameras. However, performance is a serious consideration, if for no other reason than to ensure that processing remains affordable and does not fall behind data collection. In any case, some calculations may require pre-processing of as much data as possible, in particular machine learning tasks such as motion prediction (see section \ref{motion-prediction}).

In the current iteration of the software, and with today’s multi-core possessors, tasks are highly parallelizable. Feature tracking and trajectory analysis can be performed on multiple video sequences at a time, typically cut up into 20-minute or one-hour segments, in parallel on a single mid-to-high-performance machine, or on a computer cluster. With parallel processing of video sequences on a single computer, memory becomes the main bottleneck; 32 GB or more of memory are highly recommended on a multi-core machine, to take full advantage of up to 8 threads. Alternatively, the large majority of calculation tasks can be parallelized at the observation level, as they are independent events.

Feature tracking is written in C++ for performance, while the majority of trajectory analysis is written in Python for ease of development and extensibility. Where possible, expensive trajectory analysis calculations make use of Python wrappers for fast compiled libraries.