Authorea

Volker Strobel added chapter_Discussion_label_chap_discussion__.tex almost 8 years ago

Commit id: fb8ee1a1f0963552e78a3dee5c00ec9dd5c3fc8d

deletions | additions

\chapter{Discussion} \label{chap:discussion} The comparison of the amount of samples with the average cosine similarity has shown that only a small part of the maximum amount of samples suffices to achieve similarities larger than $99\,\%$. In fact, $400 / (640 \times 480) = 0.13\,\%$. This set the stage for large speed-ups during live operation. Additionally, it offers space for further speed-ups depending on the processing power of the used CPU. The evaluation of different maps using the synthetic data showed considerable differences between the evaluated images. The range of losses from 0.24 to 0.99 clearly shows the different suitabilities of different maps for the proposed algorithm. In order to visually evaluate the proposed map evaluation technique, a simple map was constructed with two repeating tiles. The image with the minimum and the one with the maximum loss value based on their color histogram are shown in Figure~\ref{fig:minmaximg}. The different patterns of the images are clearly visible: while the image with the minimum value fulfills the desired properties---closeby areas have similar color values, distant areas are dissimilar, the image with the maximum loss is mainly black resulsting in similar histograms all over the place and leading to high loss values. This initial evidence can be taken to test the preditive power of the evaluation algorithm for texton histograms. \begin{figure}[h!] \begin{center} \includegraphics[width=0.7\columnwidth]{figures/minimg/default_figure} \caption{{\label{fig:minmaximg} \emph{Left}: Image with the lowest loss value; \emph{Right}: Image with the highest loss value% }} \end{center} \end{figure} \section{General Discussion} \label{sec:generaldiscussion} In this chapter, the results will be discussed with regard to error statistics, execution frequency, robustness, and scalability. To begin with, we recapitulate the research questions: \begin{itemize} \item R1: Can accurate 2D positions be estimated in real-time, using a machine learning-based approach on a limited processor in a modifiable indoor environment? \item R2: Is accurate real-world localization regression or classification possible when the training data comprises synthetic data only? \item R3: Can we predict the goodness of a given map for the proposed localization approach? \end{itemize} Regarding R1, the conducted experiments provide supportive evidence that a texton-based machine learning approach is able to accomplish real-time indoor localization. The proposed algorithm runs with as frequency of 30\,Hz on a single board computer with limited CPU. Shifting processing power to an offline training step and relying on random sampling are the cornerstones for running the algorithm on processors with limited CPU. Despite the small ratio between extracted image patches ($s$) and the maximum amount of different image matches ($s*$), the accuracy is hardly affected, and only slightly improves when incorporating more textons. Regarding research question R2, the initial idea—to use the synthetically generated images directly as training data—was not successful and not further followed up. This might be also the reason that only few projects have used synthetic images for real-world phenomena. The reality gap between the synthetic data and real-world data was huger than expected. Figure 15 shows an example of two image patches, one synthetically generated, one taken with the camera of the MAV. While the patches can be easily identified as similar for human eyes, the texton maps, where different colors represent different textons, are dissimilar. Blur, lighting settings, and camera intrinsics modify low-level features of the image to a too strong extent. A possible improvement might be to find a mapping from histograms of synthetic images to histograms of real images, by mapping ‘synthetic textons’ to ‘real-world textons’. Referring to R3, we found some initial evidence that the proposed map evaluation generalizes to the real-world. In contrast to R2, the generalization from the synthetic data to real-world data is of a different nature in this case. The requirement here is that maps that follow the ideal similarity distribution in the synthetically generated images also follow this distribution after being recorded with a camera. Or stated differently, for maps with a low loss value, distant image positions should not have similar histograms using the synthetic images nor the real-world images. Despite the overall promising results, we noticed drawbacks of the proposed approach during the flight tests and directions for future research. The accuracy—that is the difference between the estimates of the motion tracking system and the texton-based approach— could be further improved by incorporating more features, for example histogram of oriented gradients. Investigating further regression techniques, like Gaussian processes or Bayesian networks that can inherently handle space and time could be a worthwhile endeavor. The developed method sets the stage for numerous future research directions and improvements. The current implementation assumes rather constant height (up to few centimeters) and no angular rotations of the MAV. While a quadroter can move in every direction without performing yaw movements, using the MAV on another vehicle could require arbitrary yaw movements. In order to limit the complexity of the dataset, a ``derotation'' of the incoming image could be performed to align it with the underlying images of the dataset. While the current approach normalizes each $5\times5$ image patch to unit mean and zero variance---giving robustness to different lighting conditions---this procedure could be further extended, for example by using specific color models. While the current map evaluation approach used existing fixed images, it could also serve as a fitness function for an optimization approach---for example, an evolutionary algorithm---which modifies a given image. While the solution for a desired loss value of 0 or near 0 might be unique and independent of the original image, a higher loss value might change the initial image only to a certain extent, yielding an “improved version of the image”, which is better suited for the proposed algorithm. This could allow to find a near optimal solution for a given regression technique and give insightful view in the underlying structure of certain regression techniques. Currently, \emph{draug} generates image patches based on drawing samples from parametric distributions. This was motivated by the fact that an ideal map should be independent of previous estimates and based on single images only. In the future, the possibility to set flight routes by setting way points above an image could be included. This would allow to test the ability of the particle filter on synthetic flights. \begin{figure}[h!] \begin{center} \includegraphics[width=0.7\columnwidth]{figures/draug_pic/default_figure} \caption{{\label{fig:realitygap} Exemplifying the reality gap. \emph{Top left}: image patch generated using draug. \emph{Top right}: image patch taken with the MAV's camera after printing the patch. \emph{Below left}: texton image of the synthetic image. \emph{Below right}: Texton image of the real image. The texton images shows that corresponding regions get classified into different textons, resulting in different histograms. This makes the transfer from the syntethic data to the real world difficult.% }} \end{center} \end{figure} The shift of the processing power could be further amplified by using a different regression technique. In the current implementation ( $k$NN regression), larger training data sets are penalized due to a greater prediction time. However, the choice of a different regression technique is not as straightforward as it might seem. The technique should be able to output multiple predictions, since certain map regions might be ambiguous. The presented approach is a vision-only approach. This makes it robust to external disruptions such as magnetic fields and reduces the amount of points of failure. Additionally, the approach can be used on different devices, such as handheld cameras. Still, future developments could incorporate data from the inertial measurement unit (IMU) in the particle filter’s motion model. Additionally, the time complexity of the algorithm can be further reduced with the aim of running the algorithm on fly-sized MAVs. Depending on the target platform, parallelization or threading could be used on multi-core systems to simultaneously compute texton histograms, make predictions and run the particle filter. Currently, the computationally most complex part is the XXX.