Skip to content

woshimajintao/SoccerSense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

78 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SoccerSense Project

Overview

This project, SoccerSense, is a comprehensive soccer analytics platform that integrates multiple data sources, including structured CSV datasets, unstructured video data, and semi-structured JSON files. The primary goal is to address challenges in soccer analytics by providing automated data ingestion, advanced AI-driven analysis, and real-time insights for coaches, analysts, and scouts. image

🌐 1.Data Sources

Here I have used a variety of structured, semi-structured and unstructured data.

πŸ—‚οΈ 2. Landing Zone

  • Temporal Landing Zone
    Stores raw data including:

    • CSVs (match/player stats)
    • JSON (YouTube comments)
    • MP4 (match videos)
  • Persistent Landing Zone
    Cleaned data is written using Delta Lake to support:

    • ACID-compliant transactional storage
    • Metadata management and schema enforcement

πŸ” 3. Trusted Zone

  • PySpark is used for:
    • Large-scale data cleaning and consistency checks
    • Video metadata processing
    • Parsing and structuring YouTube comments
  • Data is stored in DuckDB for in-memory analytics

πŸ“ˆ 4. Exploitation Zone

  • KPIs (Key Performance Indicators) such as player performance and win rates are computed using PySpark
  • Results are saved in Parquet format for downstream consumption

πŸ§‘β€πŸ’» 5. Consumption Zone

A Streamlit web application provides:

  • πŸŽ₯ Video Detection Module
    Uses YOLOv8 to detect and track players and ball movement

  • πŸ’¬ Sentiment Analysis Module
    Applies VADER to classify YouTube comments by emotional tone

  • πŸ“Š KPI Dashboard Module
    Built with Streamlit + DuckDB, enabling:

    • CSV upload & table preview
    • Custom SQL querying
    • Dynamic visualization using Matplotlib

Final Product

image These are the demo videos:

https://drive.google.com/drive/folders/1uwOWW3hrIYRXy-AmMu_Qvts5WWMsAZGZ?usp=share_link

Installation

To run the code for this project, you will need to install several Python packages.

Main Packages

  • PySpark - For distributed data processing.
  • Delta Lake - To enable ACID transactions and schema enforcement with Spark.
  • yt-dlp - For downloading YouTube videos.
  • Google API Client Library - For interacting with YouTube Data API.
  • Kaggle API - For downloading datasets from Kaggle.
  • Duckdb – An in-process SQL OLAP database optimized for analytics.
  • PyTorch – PyTorch for deep learning.
  • ultralytics (YOLO) – For loading pretrained YOLO models (e.g., YOLOv5, YOLOv8).
  • Streamlit – For building interactive data apps and dashboards.
  • VADER Sentiment (vaderSentiment) – A lexicon-based sentiment analysis tool.

Installation Commands

# Core data processing and storage
pip install pyspark
pip install delta-spark

# YouTube and API interaction
pip install yt-dlp
pip install google-api-python-client

# External data access
pip install kaggle

# Additional analytics tools
pip install PyTorch
pip install duckdb
pip install streamlit
pip install vaderSentiment

# YOLO pretrained models (via Ultralytics)
pip install ultralytics

Usage

Step 1: Clone the repository

git clone https://github.com/woshimajintao/SoccerSense.git
cd SoccerSense/P2/Final_APP/Consumption\ Zone

Step 2: Installing Required Packages

Make sure all the required packages are installed as listed above.

Run the Application

Start the main Streamlit app:

streamlit run main.py

This will launch the application in your browser.

Figures

KPI Analytics

image

KPI2

Video Detection

Video Detection

Sentiment Analytics

Sentiment Analysis

Repository Link

GitHub Repository

Author

Jintao Ma - Big Data Management and Analytics Master Program

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages