This project analyzes over a century of international football matches (1902-2024) using data cleaning, transformation, and visualization techniques. The objective is to provide key insights on team performances through an interactive dashboard designed for upper management. You can find our presentation here.
The project is divided into two main phases:
- Data Integration: Cleaning and transforming the dataset for consistency and accuracy.
- Visualization: Creating dashboards that provide insightful visualizations using SAS Visual Analytics.
The dataset contains international football match records from 1902 to 2024, including:
- Date of match
- Home and away teams
- Goals scored by each team
- Venue information (neutral/away/home)
- Tournament name
- Match outcomes (win, loss, or tie)
Several transformations were applied to standardize and enrich the data:
- Convert all dates to
Date9
format. - Translate all non-English city, tournament, and country names into English.
- Add a
Winner
column that indicates the match result: Home Team, Away Team, or Tie. - Validate and remove rows with missing values.
- Final dataset saved as
results_table.csv
in/data/cleaned/
.
The project uses SAS Visual Analytics to create the following key dashboards:
-
International Games – General Information
- Top 10 countries by number of wins (excluding ties)
- Number of games per tournament (excluding friendly matches)
-
International Games – City
- Top 30 cities by number of games played.
- Filterable by tournament.
-
International Games – Country
- Drop-down list of countries, showing goals scored vs. received per year.
- Goals scored vs. received by team.
-
International Games – Map
- Number of matches played in each country, with filters for different tournaments.
-
Mystery Chart
- Wins, losses, and ties by team.
- Network analysis of frequently competing teams.
- Top Performing Teams: Identifies the countries with the highest number of wins across all tournaments.
- Tournament Trends: Highlights which tournaments have the most structurematches and which locations host the most games.
- Team Performance by Year: Shows the number of goals scored and received by country over time, useful for tracking trends in performance.
- Performance in Neutral Venues: Evaluates team performance when no home advantage is present, useful for neutral-ground tournament planning.
- Network Analysis: Visualizes team rivalries and frequently played matches.
This project successfully cleaned and transformed over a century’s worth of football match data. The interactive dashboard provides upper management with key insights on team performance, match outcomes, and venue trends, empowering data-driven decision-making for future tournaments.
- Clone the repository:
git clone [https://github.com/yourusername/football-data-analysis.git](https://github.com/MariamAmy/Football-Data-Analysis/)