Analysis of soccer matches with clustering of player trajectories and Mutual Information based metric


The main goal of this project is to extract high level semantic cues from soccer match video sequences, using machine learning and computer vision techniques.

  • During the first year, I explored the state-of-the-art techniques that could be used to extract the player trajectories given the videos of a match recorded with several static camera.

  • During the second year, I focused on the player trajectories clustering problem. I particularly focused on designing a reliable and efficient metric to establish correspondences between trajectories.


State-of-the art methods on soccer match analysis use extra high level data and annotations (such as whoscored, transfertmark) which require human annotation and so preprocessing (Reilly), Comparing Players Using Cluster Analysis. In this project we limited our input to the video of a match, recorded with a multi-camera system. We focus on unsupervised classification of the players based on clustering of their trajectories, automatically extracted from match video sequences during a short period of time. Based on this information, we are interested on characterizing several aspects of the match (such as detecting the game leaders, to anticipate the displacement of the players or to find patterns in the player paths). In such a way, we aim at obtaining details about the team strategy through the analysis of high level semantic cues.

The problem of tracking players with multiple cameras has been consistently studied, for example in (Shitrit 2011) even if their paths may intersect over long periods of time. In this studies, the clustering process is closely linked to the choice of a metric explaining the similarities among the objects we want to classify: in our case, the player trajectories. Most of the time the metrics used are based on euclidian distance (Jain 1999). However, only discrete trajectories are available in the form of array which can be of different sizes, with different time discretization or with different speed. Euclidian distance would not result in a good choice for these cases. So, in our case, a relevant clustering process should be decomposed into 2 parts:

  • Compute the distance between the trajectories we want to clusterize, producing a distance matrix.

  • Use a general clustering algorithm which is independent of the metric, to produce the clusters.

In this work, we used the clustering algorithm described in (Komodakis 2009) which take a distance matrix as single input. It automatically selects the cluster centers and the number of clusters. Our contribution consist of a novel metric, which can explain the similarity between the trajectories based on Mutual Information. It allows the clustering of the player trajectories according to the interdependency of their path.

Clustering via LP-based stabilities

Clustering is considered as one of the most fundamental unsupervised learning tasks. It consists of finding “natural” sets among the data points. More precisely, it consists in gathering the data points which are closest to each other according to a metric.