Skip to content

ifsp-projects/data-analytics-pipeline

Repository files navigation

Analytics Data Pipeline

ETL pipeline that extracts events from PostHog and loads them into BigQuery for consumption in Metabase.

Architecture

PostHog API (Lakehouse) → Extract → Transform → BigQuery (Warehouse) → Metabase / Superset

Requirements

  • Python 3.13+
  • Poetry

Setup

  1. Clone the repository
  2. Copy the environment file and fill in your values:
   cp .env.example .env
  1. Add your GCP service account JSON to credentials/gcp-service-account.json
  2. Install dependencies:
   make install

Running

# Development
make run

# Production
make run-prod

ETL

The ETL runs automatically every day at 02:00 UTC. To trigger it manually:

make etl

Or with a specific date range:

curl -X POST http://localhost:8000/api/v1/posthog/run-etl \
  -H "Content-Type: application/json" \
  -d '{"since": "2025-01-01", "until": "2025-03-20"}'

Environment Variables

See .env.example for all required variables.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors