This project collects daily snapshots from the osu! API and analyzes player performance volatility using DuckDB and Parquet.
In osu!, PP (performance points) is a key metric for player ranking.This project aims to analyze these fluctuations (volatility) and identify trends that can help players understand and improve their performance.
This project investigates the following questions:
- How does play frequency relate to PP volatility?
- How does beatmap difficulty influence PP volatility?
- How does accuracy consistency affect PP volatility?
- How does beatmap diversity relate to PP volatility?
-
PP volatility: standard deviation of daily PP deltas
-
Play frequency: average daily playcount delta
-
Accuracy trend: 30-day rolling average
-
Beatmap diversity: number of distinct beatmaps played
The system follows a lakehouse-style batch pipeline composed of three layers: raw ingestion, curated storage, and analytical queries.
-
Clone the repository:
git clone https://github.com/marcelo-schreiber/osu-duckdb.git
-
Set up a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env # Edit .env -
Run the data ingestion script:
python osu_tracker.py # use crontab for scheduling -
Run curated build SQL:
duckdb < ./sql/curated/build_curated.sql # generates curated/**/*.parquet files
-
Open and run the analysis notebook:
jupyter notebook results.ipynb
