Authorea

Video Data

Road user trajectories are extracted from video data using a feature-based tracking algorithm described in \cite{saunier06feature-based} and implemented in the open source project Traffic Intelligence¹.

Trajectories: Positions in Space and Time (x,y,t)

Trajectories are a series of points in Cartesian space representing the position of (the centroid of) a moving object (road user) at time \(t\) on a planar surface. Height \(z\) is usually not considered. Points are evenly spaced in time with a consistent \(\Delta t\) equivalent to the inverse of the framerate of the video, i.e. a measurement is done for each frame. Typical framerates for video are between 15 to 30 frames per second, providing 15 to 30 observations per moving object per second. The object (road user) itself is represented by a group of characteristic features spread over the object and moving in unison.

Three potential sources of error exist: parallax, pixel resolution, and tracking:

Parallax error is mitigated by maximising the subtending angle between the camera and the height of tracked objects. In practical terms this requires a high view or ideally a bird’s eye view, tracking objects with a small height to base ratio. Passenger cars are generally more forgiving in this respect than trucks or pedestrians.
Pixel resolution determines measurement precision. Objects further away from the camera experience lower tracking precision than objects near the camera. Error due to pixel resolution is mitigated by placing study areas nearer to the camera and using high-resolution cameras, although increases in resolution offer diminishing returns of tracking distance.
Finally, tracking errors may occur due to scene visibility issues or limits with current computer vision techniques, in particular to handle data association (e.g. attach the trajectories to the right objects when they occlude each other). These erroneous observations have to be rejected or reviewed manually.

Depending on the steps taken to minimize tracking errors, feature-based tracking functions best over study areas of 50-100 m in length with high-to-medium speed, low-to-medium density flows.

A sample of road user trajectories is presented as they are tracked in image space in Figure \ref{fig:conflict-video}. For more information on computer vision, see section \ref{software}.

Derived Data: Velocity & Acceleration

Velocity and acceleration measures are derived through differentiation from position and velocity over time respectively. These are 2-dimensional vectors with a magnitude (speed and acceleration) and a heading.

It should be noted however that each successive derivation increases pixel precession error for that measure. A velocity measure requires twice as many pixels as a position measurement. Similarly, an acceleration measurement requires three times as many pixels as a position measurement. This type of error can be compensated for with moving average smoothing over a short window (e.g. 5 frames). At this time, acceleration measurements are still too noisy to be useful for instantaneous observations. Higher camera resolutions should solve this problem in future applications.

Size of Data

\label{method-size_of_data}

Feature tracking provides a microscopic level of detail. Individual observations measured at a single site over the course of a normal day typically register in the tens of millions. The sample size (number) of individual tracking measurements (positions, velocities, etc.) per hour \(n\) can be estimated with the equation

\[\label{eqn:data-size} n = fQd\]

where \(f\) is the number of frames per second of the video, \(Q\) is the average hourly flow-rate, and \(d\) is the average dwell time of each vehicle in the scene (excluding full stops). Dwell time is affected by the size of the analysis area in the scene and the average speed. As such, the size of the analysis area needs to be carefully selected.

https://bitbucket.org/Nicolas/trafficintelligence/↩