Authorea

Lucas Fidon edited section_Metrics_for_trajectories_Most__.tex almost 8 years ago

Commit id: 76866368f3cb6fd729fb3d754a4e6237a681366b

deletions | additions

Most of the time the metrics used for clustering are based on euclidian metric. However in the field of trajectories' clustering the most competitive and widely used similarity measures are \textbf{LCSS} (Longuest Common Subsequence) and \textbf{DTW} (Dynamic Time Warping). Indeed they are more adapted to the discrete trajectories in the form of array which can be of different sizes with different time discretization or with different speed that are available. The computationally costs of LCSS and DTW are much higher though. However what we are looking for is a similarity measure which is high when two trajectories depend on each other whereas LCSS and DTW-based similarities are high when two trajectories are close to each other in the same time. For instance if a player always go to the middle of the field when another player of his team go to the adverse penalty spot they should be similar since there is a strong dependency between them. Yet in the previous example, the 2 players' trajectories will almost always remain remote from each other during the match and so will get a poor LCSS and DTW similarities. This is what motivates us for building a new similarity measure which will rely on this intuitive notion of "dependency" that need to be explicit. Another crucial point is that a reliable similarity measure for our problem should take into account the time parameter.\subsection{Mutual Information: definition and properties} Mutual Information is widely used for instance for registration of medical images as it is depicted in \cite{Pluim_2003}. The main idea is to introduced a feature space (or a joint probability) of the two trajectories we want to compare and to evaluate the quantity of "information" shared by the two trajectories based on this feature space. This quantity is calculate with Mutual Information. In our case the feature space will be the distribution of the couple of positions of 2 players' trajectories during a window of time during a few minutes. Thus it corresponds to a 4-dimension distribution. The Mutual Infomation of this distribution will be the lynchpin of our similarity measure for trajectories. \subsubsection{Entropy} Shannon introduced the entropy to be a measure of the quantity of information of a random variable. Let $X: P \rightarrow E$ be a random variable with $E$ a discrete probability space. The entropy of $X$ is defined as: $ S(X) = \sum_{x in E}P_{X}(x)*log(P_{X}(x))$ The entropy has three interpretations: \begin{itemize} \item the amount of information of a random variable (or of an event) \item the uncertainty about the outcome of a random variable (or of an event) \item the dispersion of the probability law of a random variable (or of the probabilities with which the events take place) \end{itemize} For more information about Entropy the reader can refer to \cite{Pluim_2003} or \href{http://www.yann-ollivier.org/entropie/entropie1}{La théorie de l'information : l'origine de l'entropie}. \subsection{MI-based metric for trajectories} \subsection{Empirical MI-based metric for trajectories}