Volker Strobel added chapter_Discussion_label_chap_discussion__.tex  almost 8 years ago

Commit id: fb8ee1a1f0963552e78a3dee5c00ec9dd5c3fc8d

deletions | additions      

         

\chapter{Discussion}  \label{chap:discussion}  The comparison of the amount of samples with the average cosine  similarity has shown that only a small part of the maximum amount of  samples suffices to achieve similarities larger than $99\,\%$. In  fact, $400 / (640 \times 480) = 0.13\,\%$. This set the stage for  large speed-ups during live operation. Additionally, it offers space  for further speed-ups depending on the processing power of the used  CPU.  The evaluation of different maps using the synthetic data showed  considerable differences between the evaluated images. The range of  losses from 0.24 to 0.99 clearly shows the different suitabilities of different maps for the proposed algorithm. In order to visually evaluate the  proposed map evaluation technique, a simple map was constructed with  two repeating tiles. The image with the minimum and the one with the  maximum loss value based on their color histogram are shown in  Figure~\ref{fig:minmaximg}. The different patterns of the images are  clearly visible: while the image with the minimum value fulfills the  desired properties---closeby areas have similar color values, distant  areas are dissimilar, the image with the maximum loss is mainly black  resulsting in similar histograms all over the place and leading to  high loss values. This initial evidence can be taken to test the  preditive power of the evaluation algorithm for texton histograms.  \begin{figure}[h!]  \begin{center}  \includegraphics[width=0.7\columnwidth]{figures/minimg/default_figure}  \caption{{\label{fig:minmaximg} \emph{Left}: Image with the lowest loss value; \emph{Right}:  Image with the highest loss value%  }}  \end{center}  \end{figure}  \section{General Discussion}  \label{sec:generaldiscussion}  In this chapter, the results will be discussed with regard to error  statistics, execution frequency, robustness, and scalability. To begin  with, we recapitulate the research questions:  \begin{itemize}  \item R1: Can accurate 2D positions be estimated in real-time, using a  machine learning-based approach on a limited processor in a  modifiable indoor environment?  \item R2: Is accurate real-world localization regression or classification  possible when the training data comprises synthetic data only?  \item R3: Can we predict the goodness of a given map for the proposed  localization approach?  \end{itemize}  Regarding R1, the conducted experiments provide supportive evidence  that a texton-based machine learning approach is able to accomplish  real-time indoor localization. The proposed algorithm runs with as  frequency of 30\,Hz on a single board computer with limited  CPU. Shifting processing power to an offline training step and relying  on random sampling are the cornerstones for running the algorithm on  processors with limited CPU. Despite the small ratio between extracted  image patches ($s$) and the maximum amount of different image matches  ($s*$), the accuracy is hardly affected, and only slightly improves  when incorporating more textons.  Regarding research question R2, the initial idea—to use the synthetically generated images directly as training data—was not successful and not further followed up. This might be also the reason that only few projects have used synthetic images for real-world phenomena. The reality gap between the synthetic data and real-world data was huger than expected. Figure 15 shows an example of two image patches, one synthetically generated, one taken with the camera of the MAV. While the patches can be easily identified as similar for human eyes, the texton maps, where different colors represent different textons, are dissimilar. Blur, lighting settings, and camera intrinsics modify low-level features of the image to a too strong extent. A possible improvement might be to find a mapping from histograms of synthetic images to histograms of real images, by mapping ‘synthetic textons’ to ‘real-world textons’.  Referring to R3, we found some initial evidence that the proposed map evaluation generalizes to the real-world. In contrast to R2, the generalization from the synthetic data to real-world data is of a different nature in this case. The requirement here is that maps that follow the ideal similarity distribution in the synthetically generated images also follow this distribution after being recorded with a camera. Or stated differently, for maps with a low loss value, distant image positions should not have similar histograms using the synthetic images nor the real-world images.   Despite the overall promising results, we noticed drawbacks of the proposed approach  during the flight tests and directions for future research.  The accuracy—that is the difference between the estimates of the motion tracking system and the texton-based approach—  could be further improved by incorporating more features, for example  histogram of oriented gradients. Investigating further regression  techniques, like Gaussian processes or Bayesian networks that can  inherently handle space and time could be a worthwhile endeavor.  The developed method sets the stage for numerous future research  directions and improvements. The current implementation assumes rather  constant height (up to few centimeters) and no angular rotations of  the MAV. While a quadroter can move in every direction without  performing yaw movements, using the MAV on another vehicle could  require arbitrary yaw movements. In order to limit the complexity of  the dataset, a ``derotation'' of the incoming image could be performed  to align it with the underlying images of the dataset. While the  current approach normalizes each $5\times5$ image patch to unit mean  and zero variance---giving robustness to different lighting  conditions---this procedure could be further extended, for example by  using specific color models.  While the current map evaluation approach used existing fixed images,  it could also serve as a fitness function for an optimization  approach---for example, an evolutionary algorithm---which modifies a  given image. While the solution for a desired loss value of 0 or near 0 might be unique and independent of the original image, a higher loss value might change the initial image only to a certain extent, yielding an “improved version of the image”, which is better suited for the proposed algorithm.   This could allow to find a near optimal solution for a  given regression technique and give insightful view in the underlying  structure of certain regression techniques.  Currently, \emph{draug} generates image patches based on drawing  samples from parametric distributions. This was motivated by the fact that an ideal map should be independent of previous estimates and based on single images only.   In the future, the  possibility to set flight routes by setting way points above an image could be included. This would allow  to test the ability of the particle filter on synthetic flights.  \begin{figure}[h!]  \begin{center}  \includegraphics[width=0.7\columnwidth]{figures/draug_pic/default_figure}  \caption{{\label{fig:realitygap} Exemplifying the reality  gap. \emph{Top left}: image patch generated using draug. \emph{Top  right}: image patch taken with the MAV's camera after printing  the patch. \emph{Below left}: texton image of the synthetic  image. \emph{Below right}: Texton image of the real image. The  texton images shows that corresponding regions get classified into  different textons, resulting in different histograms. This makes  the transfer from the syntethic data to the real world difficult.%  }}  \end{center}  \end{figure}  The shift of the processing power could be further amplified by using  a different regression technique. In the current implementation (  $k$NN regression), larger training data sets are penalized due to a  greater prediction time. However, the choice of a different regression  technique is not as straightforward as it might seem. The technique  should be able to output multiple predictions, since certain map  regions might be ambiguous.  The presented approach is a vision-only approach. This makes it robust to external disruptions such as magnetic fields and reduces the amount of points of failure. Additionally, the approach can be used on different devices, such as handheld cameras. Still, future developments could incorporate data from the inertial measurement unit (IMU) in the  particle filter’s motion model.  Additionally, the time complexity of the algorithm can be further reduced with the aim of running the algorithm on  fly-sized MAVs. Depending on the target platform, parallelization or threading could be used on multi-core systems to simultaneously compute texton histograms, make predictions and run the particle filter.   Currently, the computationally most complex part is the XXX.