Transferability Study of Video Tracking Optimization for Traffic Data Collection and Analysis
Despite the extensive studies on the performance of video sensors and computer vision algorithms, calibration of these systems is usually done by trial and error using small datasets and incomplete metrics such as brute detection rates. There is a widespread lack of systematic calibration of tracking parameters in the literature.
This study proposes an improvement in automatic traffic data collection through the optimization of tracking parameters using a genetic algorithm by comparing tracked road user trajectories to manually annotated ground truth data with Multiple Object Tracking Accuracy and Multiple Object Tracking Precision as primary measures of performance. The optimization procedure is first performed on training data and then validated by applying the resulting parameters on non-training data. A number of problematic tracking and visibility conditions are tested using five different camera views selected based on differences in weather conditions, camera resolution, camera angle, tracking distance, and camera site properties. The transferability of the optimized parameters is verified by evaluating the performance of the optimization across these data samples.
Results indicate that there are significant improvements to be made in the parametrization. Winter weather conditions require a specialized and distinct set of parameters to reach an acceptable level of performance, while higher resolution cameras have a lower sensitivity to the optimization process and perform well with most sets of parameters. Average spot speeds are found to be insensitive to MOTA while traffic counts are strongly affected.
The use of video data for automatic traffic data collection and analysis has been on an upward trend as more powerful computational tools, detection and tracking technology become available. Not only have video sensors been able for a long time to emulate inductive loops to collect basic traffic variables such as counts and speed as in the commercial system Autoscope (Michalopoulos 1991), but they can also provide higher-level information regarding road user behaviour and interactions more and more accurately. Examples include pedestrian gait parameters (Saunier 2011), crowd dynamics (Johansson 2008) and surrogate safety analysis applied to motorized and non-motorized road users in various road facilities (St-Aubin 2013, Sakshaug 2010, Autey 2012). Video sensors are relatively inexpensive and easy to install or already installed, for example by transportation agencies for traffic monitoring: large datasets can therefore be collected for large scale or long term traffic analysis. This so-called “big data” phenomenon offers opportunities to better understand transportation systems, presenting its own set of challenges for data analysis (St-Aubin 2015).
Despite the undeniable progress of the video sensors and computer vision algorithms in their varied transportation applications, there persists a distinct lack of large comparisons of the performance of video sensors in varied conditions defined for example by the complexity of the traffic scene (movements and mix of road users), the characteristics of cameras (Wan 2014) and their installation (height, angle), the environmental conditions (e.g. the weather) (Fu 2015), etc. This is particularly hampered by the poor characterization of the datasets used for performance evaluation and the limited availability of benchmarks and public video datasets for transportation applications (Saunier 2014). Tracking performance is often reported using ad hoc and incomplete metrics such as “detection rates” instead of detailed, standardised, and more suitable metrics such as CLEAR MOT (Bernardin 2008). Finally, the computer vision algorithms are typically manually adjusted by trial and error using a small dataset covering few conditions affecting performance while the reported performance evaluated on the same dataset is thus over-estimated: comparing to other fields such as machine learning, it should be clear that the algorithms should be systematically optimized on a calibration dataset, while performance should be reported for a separate validation dataset (Ettehadieh 2015).
While the performance of video sensors for more simple traffic data collection systems has been extensively studied, not all factors have been systematically analyzed and issues with parameter optimization and lack of separate calibration and validation datasets is widespread. Besides, the relationship of tracking performance with performance of traffic parameters has never been fully investigated.
The objective of this paper is first to improve the performance of existing automated detection and tracking methods for video data in terms of the accuracy of tracking. This is done through the optimization of tracking parameters using a genetic algorithm comparing the tracker output with manually annotated trajectories. The method is applied to a set of traffic videos extracted from a large surrogate safety study of roundabout merging zones (St-Aubin 2015), covering factors such as the distance of road users to the camera, the types of cameras, the camera resolution and weather conditions. The second objective is to study the relationship between tracking accuracy, its optimization, and different kinds of traffic data such as counts and speeds. The third and last objective is to explore the transferability of parameters for separate datasets with the same properties (consecutive video samples) and across different properties, by reporting how optimizing tracking for one condition impacts tracking performance for the other conditions. As a follow up on (Ettehadieh 2015), this new paper investigates more factors and how tracking performance is related to the accuracy of traffic parameters. This paper is organized as follows: in the next section a brief overview of the current state of computer vision and calibration in traffic applications is provided; then the methodology is provided in detail including the ground truth inventory, measures of performance and calibration procedure; and finally the last two sections discuss the results of the tracking optimisation procedure and conclusions regarding ideal tracking conditions and associated parameter sets.