yu kyung Choi added file icra2016.tex  over 8 years ago

Commit id: 507d60286577dce23bcdf2121fdeeff1a87fc778

deletions | additions      

         

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  %2345678901234567890123456789012345678901234567890123456789012345678901234567890  % 1 2 3 4 5 6 7 8  \documentclass[letterpaper, 10 pt, conference]{ieeeconf} % Comment this line out if you need a4paper  %\usepackage{spconf,amsmath,graphicx,kotex,subfigure,times,balance,multirow}  %\documentclass[a4paper, 10pt, conference]{ieeeconf} % Use this line for a4 paper  \IEEEoverridecommandlockouts % This command is only needed if   % you want to use the \thanks command  \overrideIEEEmargins % Needed to meet printer requirements.  % See the \addtolength command later in the file to balance the column lengths  % on the last page of the document  % The following packages can be found on http:\\www.ctan.org  \usepackage{graphics} % for pdf, bitmapped graphics files  \usepackage{epsfig} % for postscri\author{names}pt graphics files  \usepackage{mathptmx} % assumes new font selection scheme installed  \usepackage{times} % assumes new font selection scheme installed  \usepackage{amsmath} % assumes amsmath package installed  \usepackage{amssymb} % assumes amsmath package installed  \usepackage{caption}  \usepackage{subcaption}  \usepackage{color}  \usepackage{hhline}  \usepackage{enumerate}  \usepackage{algorithm}  \usepackage{algpseudocode}  \usepackage{mathtools}  \usepackage{multirow}  \usepackage{graphicx}  \usepackage{array}  \usepackage{float}  \usepackage{nth}  \usepackage{booktabs}  \newcommand{\Tref}[1]{Table~\ref{#1}}  \newcommand{\Eref}[1]{Eq.~(\ref{#1})}  \newcommand{\Fref}[1]{Fig.~\ref{#1}}  \newcommand{\Sref}[1]{Sec.~\ref{#1}}  \newcommand{\argmax}{\operatornamewithlimits{argmax}}  \newcommand{\argmin}{\operatornamewithlimits{argmin}}  \newcommand{\etal}{\textit{et al. }}  \title{\LARGE \bf  Thermal-Infrared based Drivable Region Detection  }  %  %\author{% <-this % stops a space}  %%\thanks{*This work was not supported by any organization}% <-this % stops a space  %\thanks{$^{1}$Kibaek Park is with the Department of Electrical Engineering Robotics Program, Hyowon Ha and Fran\c{c}ois Rameau are with the Department of Electrical Engineering, Robotics and Computer Vision Laboratory, KAIST, Daejeon, 305-701, Korea~{\tt\small $\{$kbpark,hwha,frameau$\}[email protected]}}%  %\thanks{$^{2}$In So Kweon is with the Faculty of Department of Electrical Engineering, KAIST, Daejeon, 307-701, Korea~{\tt\small [email protected]}}   %}  \begin{document}  \maketitle  \thispagestyle{empty}  \pagestyle{empty}  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%로드 디텍션은 중요하다.computationally efficiently. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  \begin{abstract}    %%contribution 에 노이지한 걸 자이용했다는것도쓰기.  Drivable region detection is challenging problem to lower vehicle accidents, and it is crucial in the night environment that has static or moving objects. Many researchers employ thermal-infrared camera to raise the visibility of drivers, but the drivable region detection algorithm that suits thermal camera have not been developed. Therefore, this paper proposes drivable region detection approach that is robust to the noisy and blurred thermal-infrared image. Our approach incorporates constructing accurate initial road to extract road samples. Next, label propagation method with global condition and weighted local condition guarantee drivable region. With temporal cues, we demonstrate the ability to detect drivable region using restricted GrowCut (R-GrowCut) in the sequential video. To this end, evaluation of road scene change is arranged to discern the need of re-initialization, and it makes our algorithm highly adaptive to change of road scene or road condition. Experiments on three kinds of sequential video (5977 images) that are representative of on-road, off-road, and obstacle scene respectively have been implemented, and the result shows the novelty of our algorithm.  \end{abstract}  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  %%%쉼표나   \  \section{Introduction}  \  %pedistrian bib search  \  In recent years, Advanced Driver Assistance System (ADAS) has been getting attention to car manufacturers and researchers. Among the various ADAS applications, road detection has been considered as a fundamental problem to relieve collision avoidance, path planning, and Autonomous Navigation System (ANS). Especially vision-based road detection is a basic requirement technology in the field of automobile and IT convergence.  Camera-based vision applications at outdoor environment suffer from various environment change such as the color difference due to view point change, specular phenomenon on metal surfaces, and drastic luminance gap in day and night. Some recent works utilize the thermal camera which is less sensitive to the environment changes, and in order to secure the visibility in the night as well as day.  Choi \etal [?? ]**** design a multispectral imaging system and show visibility enhancement for driver's day and night by fusing well-aligned thermal and RGB images. In the field of pedestrian detection, thermal image could be employed to raise the detection performance \cite{hwang2013multispectral}, \cite{olmeda2009detection}****. Furthermore, driver's emotion and activity recognition using thermal image is proposed by Kolli \etal\cite{kolli2011non} and Cheng \etal\cite{cheng2005multiperspective} to recognize the accident situation from the driver. For road detection, Wang \etal\cite{???}**** propose an algorithm which is based on vanishing point estimation in the thermal image.  \begin{figure}[t]  \centering  \begin{tabular}{@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.5\linewidth]{./figure/front1}}\hspace{1mm}  &\subcaptionbox{}{\includegraphics[width=0.5\linewidth]{./figure/front2}}\hspace{2mm}    \end{tabular}  \caption{Examples of proposed method. (a) Sample of thermal-infrared road scene from KAIST multispectral databases. (b) Safely drivable region of same scene.}  \label{fig:Front1}  \end{figure}  There are numerous works related to road detection, however, general road detection itself is not enough for path planning and ANS. To be more intelligent system, determining drivable region considered of obstacles is needed. Therefore, the focus of road detection has been gradually changed to drivable region detection rather than road detection. Generally, three kinds of approach can be considered for drivable region detection available on the thermal-infrared camera.   Early approaches \cite{alvarez2014combining}, \cite{miksik2012rapid}, \cite{rasmussen2005vehicle} usually deal with dominant orientation of line and texture to detect vanishing point and they construct drivable road originated from vanishing point. The limitation of this method is that a great deal of noisy in thermal image disturb to obtain texture and line sources. Therefore, their system sometimes fail to find the exact vanishing point and drivable road. Further, there is no consideration of obstacle, because just two lines from vanishing point are forming drivable road.  To overcome shortcomings stated above, Hoiem \etal\cite{hoiem2005geometric}, \cite{hoiem2007recovering} propose a scene classifier that utilize comprehensive information such as texture, line, location of road, intensity, or etc. The one of functions they propose is to implement segmentation of ground part. However, the range of ground they propose is usually including not safe region to drive.  \begin{figure*}[t]  \centering   \includegraphics[width=\linewidth]{./figure/system}  \caption{Overview of our algorithm that detect safely drivable region. Given first frame of video, initial road is extracted (green arrow). With initial road samples, multiple start labels propagate satisfying global and local condition in the superpixel map that is applied clustered mean (red arrow). Finally, the strategy for sequence video is proposed with R-Growcut and re-initialization evaluation (blue array).}  \label{fig:overview}  \end{figure*}  Currently, many papers \cite{dahlkamp2006self}, \cite{guo2011adaptive}, \cite{lu2014hierarchical}, \cite{moghadam2009online}, \cite{zhang2015traversable} propose the approach based on propagation from extracted samples in the unsupervised way. Dahlkamp \etal\cite{dahlkamp2006self} detect the road by constructing online learning based Gaussian Mixture Model (GMM) \cite{dempster1977maximum}, \cite{mclachlan1988mixture}. In the similar way, semi-ellipse mask are used to obtain road samples, and Lu K \etal\cite{lu2014hierarchical} propagate the drivable region using GrowCut \cite{vezhnevets2005growcut} algorithm. Although their methods are effective to color image that drivable road and background are separated definitely, the limitation is arose from increase of dependency on background scene, when the domain of camera is changed to the thermal-infrared camera. Further, their methods to obtain road samples are heuristic so that they can result in failed cases.  In this paper, we focus on the problems stated above and propose novel algorithm based on propagation method to detect drivable region with thermal-infrared camera. The contributions of this paper are summarized as follows:\  \  \  $\bullet$ We propose scene-adaptive sampling method and propagation algorithm that is different with previous works.  $\bullet$ We propose the strategy that is robust to dynamic video road sequence including night scene.   $\bullet$ The proposed method can handle static or moving object as obstacle.   $\bullet$ This is the first time to detect the drivable region with large scale thermal-infrared databases.   \  The rest of our paper is composed as follows: In Sec. \ref{Proposed Method}, we describe our overall system and the specific method of proposed algorithm. Next, the strategy for sequence video to detect drivable region is demonstrated in Sec.\ref{Strategy for Sequential Video}. We then conduct experiments and performance evaluation in Sec.\ref{Experiments}. Finally, conclusion is discussed in Sec.\ref{Conclusion}.  \section{Proposed Method} \label{Proposed Method}  \  Our approach broadly consists of three stages. Firstly, we initialize the road that should be safe to drive. We then propagate the initial road samples to find drivable   region, and refine the result for safer region to drive. Finally, the strategy for sequence video to detect drivable region is prepared with temporal cues. The overall system is shown in Fig.\ref{fig:overview}. Below, specific technical methods are described for the sake of main contribution.   \  %\etal\cite{}  \subsection{Extract Initial Road} \label {Extract Initial Road}  \  Model-based approach find out the road region by comparing the model of road and query image. In our sequential images, we determine the model of road in the initial step. Lu K \etal\cite{lu2014hierarchical} present semi-ellipse as a sampling mask in order to obtain road samples. However, semi-ellipse sampling mask in random images has a possibility to extract samples that is not a drivable region such a corner shown in Fig.\ref{fig:Comaper with icra}-(a). Therefore, proposed sampling method clarify a region with the road which has homogeneous property to prevent failed sampling cases. Gabor filter \cite{jones1987evaluation} is utilized to extract homogeneous region which is predicted as a road in the sampling step.   Texture-less region of image has weak response with Gabor wavelets determined specific orientation parameter. For that reason, summation of different orientation of Gabor filter responses is weak in texture-less rather than the region which has texture in specific orientation. Convolution of thermal image with Gabor wavelets \cite{lee1996image} is applied, where the range of image is adjusted to [0 \ 1] and we consider eight orientation with large size of Gabor filter bank. We next extract the parts $M_{i}$ as shown in Fig.\ref{fig:gabor}-(third) that have weak response according to the logical formula\ref{eq:gabor} :  %%how can i refer equation?  \begin{figure}[t]  \centering  \begin{tabular}{@{}c@{}c}  \hspace{2mm} \subcaptionbox{}{\includegraphics[width=0.45\linewidth]{./figure/icra_semielipse}}\hspace{4.5mm} &\subcaptionbox{}{\includegraphics[width=0.45\linewidth]{./figure/mine_initial}}  \end{tabular}  \caption{(a) Sampling method proposed by Lyu el al [ ], using semi-ellipse. It can result in obtaining non-drivable road samples in the corner. (b) Our sampling method that is adaptive to dynamic road scene.}  \label{fig:Comaper with icra}  \end{figure}  %JaeShin should unify figure 4 and fig 5.  \begin{figure}[t]  \centering  \begin{tabular}{@{}c}  {\includegraphics[width=1\linewidth]{./figure/gaobr_ha}}\hspace{2mm}      \end{tabular}  \caption{Overall process to extract initial road. Original thermal-infrared image is firstly convoluted with Gabor wavelets (second image in the first row), and low response parts are extracted (third image in the first row). Initial road is completed by selecting the parts that is closest safe road point (last image in the first row). Second and third row show the another results for various road scene.}  \label{fig:gabor}  \end{figure}  \begin{equation}  M_{i}=(\mathbb{T}_{_{\mathfrak{G}}}>I_{_{i\in \mathfrak{G}}}\ ?\ \ I_{_{i\in \mathfrak{G}}}\ ;\ 0)\ ,  \label{eq:gabor}  \end{equation}  where $I_{_{i\in \mathfrak{G}}}$ is the intensity of pixel response with Gabor wavelets and $\mathbb{T}$ is the threshold parameter that is criterion of weak response. 0.1 found by experiments is allocated to this parameter. Initial road mask shown in Fig.\ref{fig:gabor}-(fourth) is completed by selecting the parts that is closest to the safe road point located at the end of column and the middle of row in the image. Another results shown in Fig.\ref{fig:gabor} from various road scenes demonstrate that our method to extract initial road is adaptive to any shape of road scene.  \  \subsection{Road Propagation} \label{Road Propagation}\  \  With samples from initial road, we propagate the road for more accurate drivable region. We next obtain the safe region to drive by refining propagated result. We demonstrate the details in this subsections.  \begin{figure}[t]  \centering  \begin{tabular}{@{}c@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.3\linewidth]{./figure/representative3}}\hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.3\linewidth]{./figure/gaussian}}  \hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.3\linewidth]{./figure/gaussian1}}    \end{tabular}  \caption{ (a) Representative map that is applied of clustered mean in the superpixel level. (b) Obtaining samples for constructing gaussian model using initial road mask. (c) Global condition map formed by gaussian model.}  \label{fig:R-map,sampling,G-condition}  \end{figure}  \begin{figure}[t]  \centering  \begin{tabular}{@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.5\linewidth]{./figure/localcondition1}}\hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.35\linewidth]{./figure/ambiguity1}}  \end{tabular}  \caption{ (a) Relation of superpixel with surrounding superpixels for first assumption of local condition. (b) Second assumption of local condition: ambiguity of thermal energy is raised by increasing the distance of thermal-infrared camera with road.}  \label{fig:Weighted local}  \end{figure}  \  \subsubsection{Label Propagation}  \  From this section, we handle the image in the unit of superpixel label to manage our algorithm efficiently. In order to make the image into superpixel level, SLIC algorithm \cite{achanta2012slic} is employed. We then apply clustered mean to each superpixel to loosen a great deal of noisy in thermal-infrared image. The representative map is shown in Fig.\ref{fig:R-map,sampling,G-condition}-(a).  Inspired of the Wang's algorithm [ ] that propose label propagation of linear neighborhoods, we propose the label propagation method to detect drivable region. Zhou et al. [ ] further demonstrate geometric intuition about label propagation: 1) nearby points are likely to have the same label (local), and 2) points on the same structure such as a cluster are likely to have the same label (global). Considering these two assumption, we first form global condition and weighted local condition to overcome the limitation of label propagation.  \paragraph{Global condition}  \ For stochastic structure approach, we suppose that the intensity of superpixel within initial road are forming of Gaussian distribution. With this supposition, Gaussian model formed by initial road samples (Fig.\ref{fig:R-map,sampling,G-condition}-(b)) is applied to each superpixel to construct global condition $G_{i}$ map of label propagation described in Fig.\ref{fig:R-map,sampling,G-condition}-(c).   \paragraph{Weighted local condition}  \ From our observation on thermal images, we develop two assumptions: 1)The difference between thermal intensity of some points and that of surrounding parts is similar in the same material. 2)The further thermal camera is away from the heat source, the more thermal ambiguity is occurred. Weighted local condition$C_{i}$ based on two assumptions is set by the Eq.(\ref{weighted locl condition}):  \begin{equation}  C_{i}=D_{1}+ \frac{(D_{2}-D_{1})}{M}\cdot(M-L_{i})\ ,  \label{weighted locl condition}  \end{equation}  where $L_{i}$ indicate the column distance (Fig.\ref{fig:Weighted local}-(b) from safe road point, $D_{2}$ is ambiguity weight term from Eq.(\ref{ambiguity}) and $D_{1}$ means the average of the intensity difference between ${i}^{th}$ superpixel $S_{i}$ (within intial road) and surrounding superpixels (red circle in Fig.\ref{fig:Weighted local}-(a)) according to Eq.(\ref{side-by-side}):  \begin{equation}  D_{1}=\frac{1}{N_{s}}\sum_{i=1}^{N_{s}}\ (\ \sum_{k=1}^{N_{variable}}\frac{1}{N_{variable}}|I_{i}-I_{i-k}|\ )  \label {side-by-side}  \end{equation}  and  \begin{equation}  D_{2}=\frac{1}{4}\frac{1}{N_{s}}\frac{1}{N_{s}-1}\sum_{i=1}^{N_{s}}\sum_{j=1}^{N_{S-1}}|I_{i}-I_{i-j}|\ ,  \label {ambiguity}  \end{equation}  where $N_{variable}$ is the number of surrounding superpixels, and this value can be changed according to each superpixel. $N_{s}$ denotes the number of all the superpixels within initial road and $I$ denotes the intensity of superpixel.  Label propagation starts with multiple start labels that are selected randomly in the initial road according to the Algorithm.\ref{alg:initLabel}. Start labels then search neighborhood labels except the labels that are already visited. If neighborhood labels satisfy global condition and the difference of thermal intensity between neighborhood label and start label is less than weighted local condition, multiple start labels are propagated into neighborhood labels. These systems that contain search, condition verification, and propagation are conducted iteratively until there is no label to be propagated. Details are described in Algorithm.\ref{alg:labelPropagation}, and the overall process are shown in Fig.\ref{fig:Label propagation}  \begin{algorithm}[h]  \caption{Random Start Label Selection in Initial Road}  \begin{algorithmic}[1]  \Procedure{RandInitRoad}{}    \For{i $\gets$ 1 \textbf{to} Nr} \Comment{for all random initial labels}  \State $NextLabelAddrStack \gets randi()\mod Ns$  \State \Comment{Ns the number of superpixels}  \State $NextLabelCounter \gets NextLabelCounter + 1$  \EndFor   \EndProcedure  \end{algorithmic}   \label{alg:initLabel}  \end{algorithm}  \begin{algorithm}[h]  \caption{Label Propagation}  \begin{algorithmic}[1]  \Procedure{LabelProp}{}  \State $global~condition~GC \gets GetGlobalCondition()$  \State $local~condition~LC \gets GetLocalCondition()$  \State $visited~label~ L \gets \emptyset$  \State $cnt \gets 0$  \State $stack \gets 0$    \ForAll{$i \in \textit{Labels}$}  \State $val \gets Superpixel(Label(i))$   \State $\textit{I} \gets \textit{SearchNeighbors(LabelStack(i))}$  \ForAll{$ind \in \textit{I}$}  \If{$ind \notin L$} \Comment{Not visited yet}  \State $val2 \gets Superpixel(Label(ind))$  \If{$val2 \in GC \And |val1-val2| \in LC$}   \State $cnt \gets cnt + 1$ \Comment{Satisfy conditions}  \State $stack(cnt) \gets Label(ind)$  \EndIf  \State $L \gets L \cup \{ind\}$  \EndIf   \EndFor     \State $NextLabelAddressStack \gets stack$  \State $NextLabelCount \gets cnt$  \EndFor  \EndProcedure   % \Procedure{LabelProp}{}  % \State $global~condition~GC \gets GetGlobalCondition()$  % \State $local~condition~LC \gets GetLocalCondition()$  % \State $visited~label~ L \gets \emptyset$  % \ForAll{$i \in \textit{Labels}$}  % \State $cnt \gets 0$  % \State $stack \gets 0$  %   % \For{j $\gets$ 1 \textbf{to} i}  % \State $\textit{I} \gets \textit{SearchNeighbors(LabelStack(j))}$  % \ForAll{$ind \in \textit{I}$}  % \If{$ind \notin L$} \Comment{Not visited yet}  % \State $val \gets Superpixel(Label(ind))$  % \If{$val \in [LC, GC]$}  % \State $cnt \gets cnt + 1$   % \State $stack(cnt) \gets Label(ind)$  % \EndIf  % \State $L \gets L \cup \{ind\}$  % \EndIf   % \EndFor   % \EndFor   % \State $NextLabelAddressStack \gets stack$  % \State $NextLabelCount \gets cnt$  % \EndFor  % \EndProcedure  \end{algorithmic}   \label{alg:labelPropagation}  \end{algorithm}  \begin{figure}[t]  \centering  \begin{tabular}{@{}c@{}c@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/step1}}  % \hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/step2}}  % \hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/step3}}  \label{fig:label propagation}  \end{tabular}    \begin{tabular}{@{}c@{}c@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/step5}}  % \hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/step7}}  % \hspace{2mm}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/step10}}    \end{tabular}  \caption{ Label propagation process with global and weighted local condition: (a) 1-step (Multiple start label) (b) 2-step (c) 3-step (d) 5-step (e) 7-step (f) 10-step. Color denotes each step of label propagation.}  \label{fig :Label propagation}  \end{figure}  %알고리즘 테이블과 설명부분은 남일형과 좀더 상의후 쓰기  \subsubsection{Refinement}  \ \label{Refinement}\   \  We consider that weakly connected parts with main drivable region are not a safe region to drive. Therefore, we separate weakly connected parts as outlier from the main drivable part. The function of morphological image analysis (MIA) erode and dilate filters are proposed to handle these outliers. Generally, erode and dilate filters are the result of convolution between original mask and shape-operator. We first inject four times of erosion with the line shape-operator that has four kinds of angle ($\theta =$0, 45, 90, 135). We then remove the separated parts from main drivable region. In order to compensate reduced region from erode filter, little dilation is inserted with circle shape-operator. Finally, conditional random field (CRF) proposed by ~'s paper [ ] is adopted to refine the drivable region mask.  \  \  \section{Strategy for Sequential Video}\ \label{Strategy for Sequential Video}\   \  In the Sec.\ref{Road Propagation}, we obtain safely drivable region from label propagation. However, it requires a great cost of processing time. In order to manipulate our algorithm in the computatively efficient way, more simplified approach is designed in the sequential video with temporal cues. Furthermore, the simplified algorithm can adjust the drivable region in the detail way. We establish two strategies, R-GrowCut and Re-initialization that are highly adaptive to sequence video regardless of any shape of road.\  \subsection{Restrict the Region for GrowCut}\ GrowCut system [ ] is one of the segmentation strategies to detach some parts that users are interested in. This system is powerful to simple scene that interested part is well distinct from background part. In other words, it has high background scene dependency as shown in Sec.\ref{Experiments}. For this reason, we lower the dependency of background by densely restricting the region that GrowCut is to be applied.   The difference of road shape between two sequential images is tiny. From this property, the road shape of next image can be estimated, if we can measure how much the drivable region is changed from previous drivable region. Therefore, we just give attention to the changeable region from previous mask.   In order to estimate changeable region, we first employ dilate and erode filter (stated in section \ref{Refinement} until thirty percent of previous drivable region (blue region in Fig.9) are expanded and reduced. We then assign (+1) label as a drivable region to the inside of reduced mask (green region in Fig.9) and the outside of extended mask (red region in Fig.9) takes (-1) label as a background part. Finally, changeable region is completed by allocating (0) label to the parts between (-1) labels and (+1) labels. (gray region in Fig.9).   \begin{figure}[t]  \centering  \begin{tabular}{@{}c}    {\includegraphics[width=1\linewidth]{./figure/r-growcut_all}}      \end{tabular}      \caption{The process of restricted GrowCut (R-GrowCut). (blue): previous road map. (red): outside of dilation. (green): inside of erosion. (yellow): result of R-GrowCut.}  \label{fig:Label propagation}  \end{figure}  %[그림 나타내기]  \subsection{Application of GrowCut to Restricted Region}\ GrowCut algorithm [ ] is one of propagation strategies for segmentation of interested part. In the first step of GrowCut, start labels are selected. Unchangeable labels that take (+1) and (-1) stated above are the start labels $l_{a}$ in our system, and then they attack surrounding labels $l_{d}$ to propagate their labels. Because the region that can be propagated is densely restricted, we propose our propagation method in sequential video as a restricted GrowCut (R-GrowCut). If the strength of surrounding labels $\Delta_{d}$ is less than that of attacking labels $\Delta_{a}$, surrounding labels are propagated into the same with attacking labels $(l_{a}=l_{d})$. The equation for strength evaluation is written as:  %\label{R_growcut2}  \begin{equation}  \label{R_growcut1}  (1-\frac{\parallel I_{a}- I_{d}\parallel _{_{2}}}{max\parallel I \parallel_{_{2}}})\cdot\Delta _{a}>\Delta_{d}\ ,  \end{equation}  where $I_{a}$ means the intensity of attacking superpixel and $I_{d}$ is the intensity of surrounding superpixel. If the surrounding label is occupied by attacking label, the strength of surrounding label is transformed according to the Eq.(\ref{R_growcut2}):  \begin{equation}  \label{R_growcut2}  \Delta_{d}=(1-\frac{\parallel I_{a}- I_{d}\parallel _{_{2}}}{max\parallel I \parallel_{_{2}}})\cdot\Delta _{a}\ .  \end{equation}  R-GrowCut system is applied repeatedly until the propagation of labels is converged as shown in Fig.9 (yellow region). Finally, we utilize conditional random field (CRF) for refining the result of R-GrowCut stated in Sec.\ref{Refinement}.  \subsection{Re-initialization Strategy}\   % KB modified  The model explained in R-GrowCut stage is affected by the distribution of thermal energy on the road surface. When thermal camera mounted on the car is faced with moving objects, speed-bumper, off-road surface or etc. that can arouse rapid scene change, the model cannot represent the surface of drivable road. To tackle with this practical issue, re-initialization strategy is arranged against this failure case as following Eq.(\ref{reInitial}).  Distribution of thermal energy on the road is occluded by sudden scene changes. Re-initialization is proceeded redundantly since the model cannot represent occluded scene. To reduce this redundant process, we exploit whole image and local quadrantal region of image. Even though scene has occlusion region, local correlation except occluded part has high correlation between the model. In Eq.(\ref{reInitial}), $P_{1},P_{2},P_{3},P_{4}$ are normailized local correlation and $P_{L}$ is global correlation by using normalized correlation.  \begin{equation}  \label{reInitial}  \rho_{_{PH}}=\frac{ 1 }{1 + \exp(({P_{L} + e^{P_{L}}\frac{1}{N}\sum_{i=1}^{N}P_{i}})^{2})}  \end{equation}  Sigmoid like equation indicate great correlation between model around the value 0.5. N is determined as 4 in our local quadrantal image region.   %(우리는 비디오를 첨부하였다 라는 말을 넣어도 되나??)  %(이부분너무 휴리스틱해보이는데, 어떻게 돌려말할방법없나..))  %And if the value of correlation indicator is lower than 0.6, we set our system to re-initialize the next SDR. With re-initialization strategy, our algorithm can be adaptive to any road scene change regardless of on-road or off-road.   \begin{figure*}[h]   \centering  \begin{tabular}{@{}c@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/Boxplot_Campus}}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/Boxplot_city}}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/Boxplot_mountain}}  \end{tabular}  \caption{ErrorRate distribution of all frames in each sequence video: (a) Campus sequence (b) City sequence (c) Mountain sequence.}  \label{fig:boxplots}  \end{figure*}  \section{Experiments} \label{Experiments}  In this section, we perform experiments on three sequential videos to show the effectiveness of our algorithm on thermal-infrared images. We employ a video as our \textit{Campus} sequence from KAIST All-Day Benchmark.\footnote{https://sites.google.com/site/alldaydataset/}. To cover diverse scenarios, we capture \textit{Mountain} and \textit{City} sequences using FLIR A655sc which have better spatial resolution with less noise. The first sequence, the \textit{Campus}, contains 3573 frames that represent general on-road scene with clear lane information.  \textit{City} (1318 frames) and \textit{Mountain} (1087 frames) sequences represent daily life scene with many moving objects and off-road scene respectively. We manually produce ground-truth of drivable region for all images (5977 frames). The evaluated sequences and ground-truth   can be found in our website.\footnote{https://sites.google.com/site/drivableRegion/}  \    - Campus : FLIR A35, 3573 frames, 320 x 240, 8bit\    - Mountain : FLIR A655sc, 1086 frames, 640 x 480, 16bit\    - City : FLIR A655sc, 1319 frames, 640 x 480, 16bit      \label{fig:sampleResult}  Some examples of each sequences are shown in Fig.\ref{fig:sampleResult} with the qualitative results.     \subsection{Evaluations on various scenes}  To verify effectiveness of the proposed algorithms, we compare our method with several state-of-the-art approaches. There are four kinds of approaches that could be available to thermal-infrared image.    Gaussian Mixture Model (\textit{GMM}) based approach proposed by ~et al[ ] and \textit{GrowCut} based approach proposed by Lyu et al [ ] are representative of propagation method from initial samples to detect drivable region. Our sampling method is introduced to \textit{GMM} approach and semi-ellipse proposed by Lu et al [ ] is employed for sampling mask in \textit{GrowCut} approach. Vanishing point (\textit{VP}) based approach proposed by H.Kong et al [ ] represent the drivable region detection algorithm with texture and line information. \textit{Classifier} is summative information (i.e. texture, intensity of pixel, location of ground, or etc.) based approach proposed by Hoiem et al [ ] to parse the road scene, and the ground part of their result is evaluated as drivable region.    In order to evaluate the accuracy of each frame, we adopt three kinds of quantitative pixel-wise criterion, FPR (False Positive Rate), FNR (False Negative Rate), and ErrorRate [ ] [ ], according to the Eq.(\ref{FPRFNR},\ref{ErrorRate}):       \begin{equation}  \label{FPRFNR}  FPR=\frac{N_{FP}}{N_{P}} * 100\ (\%) ,\ FNR=\frac{N_{FN}}{N_{N}} * 100\ (\%) ,  \end{equation}        \begin{equation}\label{ErrorRate}  ErrorRate=\frac{(N_{FP}+N_{FN})}{(N_{P}+N_{N})} *100\ (\%)  \end{equation}      where $N_{FP}$ is the amount of wrongly detected pixels as drivable region, $N_{FN}$ is the amount of wrongly detected pixels as non-drivable region; $N_{P}$ is drivable region pixels within drivable region and $N_{N}$ is the amount of pixels outside of drivable region in the groundtruth respectively. We further evaluate overall performance [2015 ICRA ] from weighted average, where weight means the amount of images in each sequence. The performance summary of each method with three kinds of video is described in \Tref{table}.    \begin{table}[t]  \centering  % \begin{tabular}[b]{p{1.2cm}p{0.1\linewidth}p{0.1\linewidth}p{0.1\linewidth}p{0.1\linewidth}p{0.1\linewidth}}   % \toprule[1.3pt]  % & \parbox{0.1\linewidth}{\centering Hoeim (2007)}  % & \parbox{0.1\linewidth}{\centering GMM (2007)}  % & \parbox{0.1\linewidth}{\centering VP (2012)}  % & \parbox{0.1\linewidth}{\centering GrowCut (2014)}  % & \parbox{0.1\linewidth}{\centering Proposed} \\   \begin{tabular}[t]{p{1.2cm}ccccc}   \toprule[1.3pt]  & Classifier  & GMM  & VP  & GrowCut  & Proposed \\   \midrule[1.2pt]   \multicolumn{6}{l}{\textit{\textbf{FPR; False Positive Rate}}} \\   \midrule[1.2pt]     \parbox{1.2cm}{\centering Campus}   & 48.21 & 51.54 & 34.35 & \textbf{4.68} & \textbf{9.01} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering Mountain}   & 205.59& \textbf{97.96} & 274.59& 167.47& \textbf{22.80} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering City}   & 83.52 & 66.91 & \textbf{46.29} & 86.13 & \textbf{10.72} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering Overall }   & 84.59 & 63.36 & 80.43 & \textbf{52.22} & \textbf{11.89} \\ \cmidrule{1-6}    \midrule[1.2pt]   \multicolumn{6}{l}{\textit{\textbf{FNR; False Negative Rate}}} \\   \midrule[1.2pt]     \parbox{1.2cm}{\centering Campus}   & \textbf{3.07} & 9.98 & 5.69 & 52.77 & \textbf{5.40} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering Mountain}   & \textbf{0.54} & 5.69 & 2.68 & 2.83 & \textbf{2.35} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering City}   & \textbf{1.24} & 8.68 & 10.71 & \textbf{2.46} & 6.61 \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering Overall }   & \textbf{2.20} & 8.91 & 6.25 & 32.60 & \textbf{5.11} \\ \cmidrule{1-6}    \midrule[1.2pt]   \multicolumn{6}{l}{\textit{\textbf{Error rate}}} \\   \midrule[1.2pt]     \parbox{1.2cm}{\centering Campus}   & \textbf{15.13} & 22.98 & 17.63 & 19.87 & \textbf{5.72} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering Mountain}   & 43.95 & \textbf{21.43} & 29.87 & 26.11 & \textbf{6.71} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering City}   & \textbf{16.69} & 21.93 & 19.79 & 21.47 & \textbf{7.44} \\ \cmidrule{1-6}    \parbox{1.2cm}{\centering Overall }   & 20.71 & 22.46 & \textbf{20.33} & 21.35 & \textbf{6.28} \\ \cmidrule{1-6}     \bottomrule[1.3pt]   \end{tabular}  \caption{Performance summary.}  \label{table}  \end{table}          \begin{figure*}[t]   \centering  \begin{tabular}{@{}c@{}c@{}c}  \subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/ClassHist_Campus}}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/ClassHist_city}}  &\subcaptionbox{}{\includegraphics[width=0.33\linewidth]{./figure/ClassHist_mountain}}  \end{tabular}  \caption{Class accuracy with ErrorRate of proposed method. SpeedBump is when the camera-mounted vehicle pass through speed bump. MovingObj, Corner, and Lane are when the thermal camera is faced with moving objects, corner shaped road, and traffic lane respectively. Etc is when the state of camera does not belong to any case: (a) Campus sequence (b) City sequence (C) Mountain sequence.}  \label{fig:classHist}  \end{figure*}  \begin{figure*}[t]   \centering  \begin{tabular}{@{}c@{}c@{}c @{}c}  {\includegraphics[width=0.078\linewidth]{./figure/name}}  &\subcaptionbox{}{\includegraphics[width=0.3\linewidth]{./figure/campus}}  &\subcaptionbox{}{\includegraphics[width=0.3\linewidth]{./figure/city}}  &\subcaptionbox{}{\includegraphics[width=0.3\linewidth]{./figure/mountain}}  \end{tabular}  \caption{Qualitative result samples. (a) Campus sequence. (b) City sequence. (c) Mountain Sequence.}  \label{fig:sampleResult}  \end{figure*}  % \label{table:expAlgs}  As shown in \Tref{table}, proposed method is superior to the other comparative with remarkable improvement in ErrorRate. The ErrorRate of other approaches is quite high, because they do not consider the problems from when the domain of rgb image is changed into the thermal–infrared camera. For further analysis, we construct box histogram based on the ErrorRate value as described in Fig.\ref{fig:boxplots}, where rectangular means densely distributed ErrorRate area, black dotted line denotes the valid range of ErrorRate, and blue dots are outliers in this figure.   \textit{GMM} approach cannot make a discrimination between drivable region and background as shown in Fig.\ref{fig:sampleResult}, because number of cases that each pixel can express is just 255. Therefore, some parts of background pixel become same probability value with that of drivable region.   % \label {intial road}  \label {Extract Initial Road}  Ambiguity of edge surrounding drivable region is raised in blurred thermal image. Therefore, \textit{GrowCut} method fail to derive the accuracy as Ly et al [ ] propose due to the high dependency on background from that ambiguity. Furthermore, with the limitation of their sampling method stated in Sec.\ref{Extract Initial Road}, ErrorRate evidently becomes high in the mountain video that contains many times of corner sequence. Finally, Fig.\ref{fig:sampleResult} show that \textit{GrowCut} approach to detect drivable region in the sequence video is quite unstable from the broad range of box ErrorRate.   Although the vanishing point is obvious with human eyes in the \textit{Campus} sequence, \textit{VP} method is limited by a great deal of noisy in the night image. In the \textit{City} sequence, many static or moving objects disturb to find the vanishing point so that ErrorRate is relatively high compared to the \textit{Campus} sequence. \textit{VP} method is further weak to the \textit{Mountain} sequence, because the curved shape of drivable region make \textit{VP} method confused to find vanishing point.  The ErrorRate of \textit{Classifier} is relatively high compared to our approach. Generally, the results they propose concern about approximative ground part in the road scene as shown in Fig.\ref{fig:sampleResult}. Therefore, Classifier cannot find exact drivable region in the mountain sequence that drivable region occupy low rate of pixels in the image. Furthermore, moving objects are not handled in \textit{Classifier}.  \subsection{Analyse the proposed method}  Overall performance in \Tref{table} show that our algorithm achieve best performance. Our method has outperformed the other approaches especially in the Mountain sequence. In other words, we consider drivable region that is corner-shaped with adaptive initial road samples. Label propagation algorithm and strategy for sequence video (R-Growcut and re-initialization evaluation) in our method demonstrate the robustness in the sequence of City that contains varied obstacles, especially pedestrians and vehicles. The narrow range of box in Fig.\ref{fig:boxplots} verify the stability of ErorRate in our method. For further analysis, we categorize each sequence video into five kinds of class (definition is illustrated in Fig.\ref{fig:classHist}) and performance evaluation of our algorithm is applied to each class using pixel-wise quantitative criterion stated in Eq.(\ref{FPRFNR},\ref{ErrorRate}). The result of classified analysis is described as histogram in Fig.\ref{fig:classHist}.  As shown in Fig.\ref{fig:classHist}, \textit{MovingObjs}, \textit{SpeedBump}, and \textit{Corner} in each sequence are comparatively unfavorable to our algorithm. We can employ temporal cues in the R-GrowCut stage with the assumption that the difference of drivable region in two sequence images is tiny. However, when the thermal camera mounted on the vehicle is faced with moving objects and corner-shaped road, the shape of drivable region is rapidly changed. Furthermore, rapid scene change can stem from when the vehicle pass through speed bump. These rapid changes make restricted region in the R-GrowCut stage beyond the bounds of drivable region, and therefore, ErrorRate becomes high. However, ErroRate avoids to be diverged with the re-initialization strategy we arrange and becomes stable. Meanwhile, \textit{Lane} and \textit{Etc.} (except all information) make the drivable region well distinct from background so that they are helpful to detect drivable region as described in Fig.\ref{fig:classHist}-(a).  \section{Conclusion} \label{Conclusion}  In this paper, we propose novel algorithm to detect drivable region that suits thermal-infrared image considering night environment. The experiments on three kinds of sequential video that have large scale of images demonstrate the robustness of our algorithm compared to the other outstanding approaches. The specific analysis including categorized class show that our algorithm is adaptive to any shape of road scene to detect drivable region using thermal-infrared camera.  {  \bibliographystyle{ieee}  \bibliography{icra2016}  }  \end{document}