This data science project aims to value all football player actions on the pitch. This provides a better view over how different actions contribute to goals, since most models only consider single actions or shots. This project builds up on the paper Actions Speak Louder than Goals by Decroos et al. 2019 and the corresponding Github repository. Some ideas were also inspired by Statsbombs On Ball Value metric.
The entire code for the project can also be found in this Github repository.
Generally we consider a good action in football as an action that either increases scoring probability or decreases conceding probability or does both. So we can define the action value of an action with the following formula:
VAEP value(ai) = ΔPscores(ai) − ΔPconcedes(ai)
ΔPscores(ai) = Pscores(Si) − Pscores(Si−1)
ΔPconcedes(ai) = Pconcedes(Si) − Pconcedes(Si−1)
We use the statsbomb data from the mplsoccer
python library, this data contains 200 games from the UEFA Euro 2020, FIFA World Cup 2022, 1. Bundesliga 23/24 and UEFA Euro 2024. For these compoetitions the data contains all game states at the time of an action. Each game state contains informations about the action, the acting player and the positions of the other players in the camera frame.
We use a variety of features to estimate the probabilities in the model:
- Distance to Goal: the distance of the ball to the opponents / own goal
- Angle to Goal: the angle of the ball to the opponents / own goal
- Time Elapsed: the time elapsed in the game since the start of the game
- Duration: the duration of the action
- Score Difference: the score difference at the time of the action
- Body Part: the body part with which the action was performed
- Action Type: the type of the action
- Closest Defender Distance: the distance of the ball to the closest defending player
- Opponents in Front: the number of opponents in front of the ball with respect to the x coordinate
- Goalkeeper Distance to Ball: the distance of the opponents / own goalkeeper to the ball
- Goalkeeper Distance to Goal: the distance of the opponents / own goalkeeper to the goal
- Goalkeeper in Shooting Triangle: whether or not the opponents / own goalkeepeer is in the triangle between the ball and posts
For probability estimation we label the 10 actions previous to a goal. Actions from the team scoring the goal will receive a 1 for the scoring label and actions from the conceding team will receive a 1 for the conceding label. All other actions will be labelled with a 0.
We use an XGBoost classifier to estimate the scoring and conceding probabilities given the features. The model outputs the following:
-
predicted_goal_prob
: Probability of scoring from the event. -
predicted_concede_prob
: Probability the team concedes after the event. -
action_value
: Estimated contribution of the action to the overall chance of scoring.
The files in the project are required to be run in the following order:
-
1_Loading_Data.ipynb
Loads the data, constructs the features, assigns the labels, computes player minutes and stores everything asdata_cleaned.pkl
andplayer_minutes.pkl
. -
2_Training_Model.ipynb
Trains the model, estimates the probabilities, computes action values and stores the data indata_cleaned_trained.pkl
. -
3_Result_Visualization.ipynb
Visualizes the model results and shows various potential applications with examples.
The model can rate all players in the dataset by their action value per 90 minutes.

Plots a single event on a pitch, including visible area and freeze-frame positions of players.
plot_event_with_360(event_row)
Generates a table and pitch plot for a sequence of actions (passes, carries, shots, etc.) leading up to a key moment.
plot_action_chain_by_id(df_model, SCENARIO)
Creates an animated pitch plot of action sequences with:
-
Player locations
-
Arrows for passes/dribbles
-
Dynamic predicted goal/concede probabilities
action_chain_animation.mp4
animate_action_chain(df_model, SCENARIO, save_path='action_chain_animation.mp4')
Creates a pitch heatmap showing areas from which actions resulted in high predicted probabilities of scoring.
plot_team_goal_prob_heatmap(team_id, player_id=None)
-
pandas
-
matplotlib
-
mplsoccer
-
numpy
-
cmasher
-
sklearn
-
xgboost
-
collections
Install dependencies with:
pip install pandas matplotlib mplsoccer numpy cmasher sklearn xgboost collections