We're going to use a hierarchical model for the number of goals scored by each team in a premier league game in order to predict results.
We have lots and lots of clean data at both the team and player levels.
The basic building block will be a negative binomial model for the number of shots on target generated by each team, coupled with a binomial model for the number of SOTs converted into goals.
Let \(S_i\) denote the number of shots on target generated by team \(i\) and let \(G_i\) be the number of goals scored by team \(i\). We write down the following model for the number of goals scored
\(S_i \sim \textrm{Poisson}(\lambda_i); \ \lambda_i \sim \textrm{Gamma}(r_i,\frac{\rho_i}{1-\rho_i}); \ G_i \sim \textrm{Binomial}(S_i,p_i)\)
How does player-level information feed in to this?
We can choose to introduce this information through the parameters \(r_i, \rho_i, p_i\). Note that if we fix \(r_i\) to be integer-valued, the rate at which a team generates shots on target becomes the sum of i.i.d exponential random variables with rate \(\frac{1 - \rho_i}{\rho_i}\). This suggests the obvious extension to the case where the rate \(\lambda_i\) is given by the sum of not necessarily iid exponential random variables. In other words, each player has a mean rate at which he generates shots on target. In each game, the rate for each player is drawn from an exponential distribution to account for 'bad days at the office'. The number of shots on target generated by each player is then drawn from a Poisson distribution and these are summed to obtain the total number of shots on target for each team. The number of these shots which are converted is drawn from a binomial distribution, with success rate \(p_i\). The player-level data on shots-on-target is readily available and can be incorporated into the estimation of the model.
It probably makes sense for each player to have a separate binomial rate, possibly pooled towards a team-level mean.
Player-level abilities:
Attacking:
- SOT generation (creation for others) -- key passes, something like expected assists
- Finishing -- binomial rate
- SOT generation (creation for self) -- shots on target without associated key pass
Defending:
- SOT rebuttal (blocking) + goalkeepers -- 'last ditch'
- Control & intelligence (team-level with individual-level mixed effects, using interceptions to estimate) -- prevention of key passes
Essentially this gives individual-level ratings for attacking and defending abilities; there is no direct measurement able to elucidate the contribution of a single player to the prevention of a key pass, so we do some partial pooling to try and learn the contribution of each player to the team's prevention of key passes.
Key aspects of the game and associated data:
Possession -- passing accuracy, dribbling.
Defending -- blocks, interceptions, tackles.
Attacking -- shots on target, goals, key passes, assists.
Attacking:
Movement (ability to receive key passes, corrected for ability of players on team to generate key passes)
Ability to generate key passes
Finishing -- ability to convert shots on target into goals
self-creativity -- ability to generate shots on target without associated key pass