Authorea

Nicolas Saunier edited section_Introduction_The_use_of__.tex almost 9 years ago

Commit id: 29c43683ba9246e2e77b4682a2ec7ece400ea825

deletions | additions

\section{Introduction} The use of video data for automatic traffic data collection and analysis has been on an upward trend as more powerful computational tools, detection and tracking technology become available. Not only have video sensors been able for a long time to emulate inductive loops to collect basic traffic variables such as counts and speed as in the commercial system Autoscope \cite{michalopoulos91autoscope}, but they can also provide more and more accurately higher-level information regarding road user behavior and interactions. Examples include pedestrian gait parameters \cite{saunier11stride-length-trr}, crowd dynamics~\cite{johansson08crowd} and surrogate safety analysis applied to motorized and non-motorized road users in various road facilities~\cite{St_Aubin_2013,Sakshaug_2010,Autey_2012}. Video sensors are relatively inexpensive and easy to install or already installed for example by transportation agencies for traffic monitoring: large datasets can therefore be collected for large scale or long term traffic analysis. This so-called ``big data'' phenomenon offers to better understand transportation systems, with its own challenges for data analysis~\cite{st-aubin15big-data}. Despite the undeniable progress of the video sensors and computer vision algorithms in their varied transportation applications, there persists a distinct lack of large comparisons of the performance of video sensors in varied conditions such as the complexity of the traffic scene, the characteristics of cameras~\cite{Wan_2014} and its installation (height, angle), the environmental conditions (e.g.\ the weather)~\cite{Fu_2015}, etc. This is particularly hampered by the poor characterization of the datasets used for performance evaluation, the limited availability of benchmarks and public video datasets for transportation applications~\cite{saunier14dataset}. Tracking performance is often reported using ad hoc and incomplete metrics such as ``detection rates'' instead of standard and more suitable metrics such as CLEAR MOT~\cite{Bernardin_2008}. Finally, the computer vision algorithms are typically manually adjusted by trial and error using a small dataset covering few conditions affecting performance while performance evaluated on the same dataset is thus over-estimated: compared to other fields such as machine learning, it should be clear that the algorithms should be systematically optimized on a calibration dataset, while performance should be reported for a separate validation dataset~\cite{ettehadieh15systematic}.

This paper is organized as follows: it provides in the next section a brief overview of the current state of computer vision and calibration in traffic applications, then presents the detailed methodology including the ground truth inventory, measures of performance and calibration procedure, followed by a presentation and discussion of the results to conclude with a summary and recommendations for future research.