ETL pipeline that extracts events from PostHog and loads them into BigQuery for consumption in Metabase.
PostHog API (Lakehouse) → Extract → Transform → BigQuery (Warehouse) → Metabase / Superset
- Python 3.13+
- Poetry
- Clone the repository
- Copy the environment file and fill in your values:
cp .env.example .env- Add your GCP service account JSON to
credentials/gcp-service-account.json - Install dependencies:
make install# Development
make run
# Production
make run-prodThe ETL runs automatically every day at 02:00 UTC. To trigger it manually:
make etlOr with a specific date range:
curl -X POST http://localhost:8000/api/v1/posthog/run-etl \
-H "Content-Type: application/json" \
-d '{"since": "2025-01-01", "until": "2025-03-20"}'See .env.example for all required variables.