Skip to content

Latest commit

 

History

History

dbt and PostgreSQL for Analytics

Python PostgreSQL dbt uv Docker

License

This project is meant for experimenting with dbt and the dbt-postgres adapter for Analytics, using NYC TLC Trip Record dataset as the datasource, with Kimball dimensional modeling technique.

NOTE: This is NOT meant for production at scale, but rather for educational purposes. Consider using BigQuery, Snowflake, StarRocks, ClickHouse, Databricks or RedShift instead.

Getting Started

1. Install dependencies from pyproject.toml and activate the created virtualenv:

uv sync && source .venv/bin/activate

2. (Optional) Install pre-commit:

brew install pre-commit

# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install

3. Setup dbt profiles.yaml accordingly (use the profiles.tmpl.yaml as template)

3.1. By default, the profiles_dir is the user '$HOME/.dbt/'

mkdir -p ~/.dbt/
cat profiles.tmpl.yml >> ~/.dbt/profiles.yml

3.2. Set the environment variables for dbt-postgres:

export DBT_POSTGRES_HOST=localhost
export DBT_POSTGRES_PORT=5432
export DBT_POSTGRES_DATABASE=nyc_taxi
export DBT_POSTGRES_SOURCE_SCHEMA=public
export DBT_POSTGRES_TARGET_SCHEMA=nyc_tlc_record_data
export DBT_POSTGRES_USER=postgres
export DBT_POSTGRES_PASSWORD=postgres

4. Install dbt dependencies and trigger the pipeline

4.1. Run dbt deps to install dbt plugins

dbt deps

4.2. Run dbt seed to push/create the tables from the .csv seed files to the target schema

dbt seed

4.3. Run dbt run to trigger the dbt models to run

dbt build --target [prod|dev]

# Alternatively you can run only a subset of the models with:

## +models/staging: Runs the dependencies/preceding models first that lead 
## to 'models/staging', and then the target models
dbt [build|run] --select +models/staging --target [prod|dev]

## models/staging+: Runs the target models first, and then all models that depend on it
dbt [build|run] --select models/staging+ --target [prod|dev]

5. Generate the Docs and the Data Lineage graph with:

dbt docs generate
dbt docs serve

Access the generated docs at:

open http://localhost:8080

Containerization

1. Build the Docker Image with:

docker build -t dbt-postgres:latest . --no-cache

2. Start a container with it:

docker run -d --rm \
  -e DBT_POSTGRES_HOST=host.docker.internal \
  -e DBT_POSTGRES_DATABASE=nyc_taxi \
  -e DBT_POSTGRES_SOURCE_SCHEMA=public \
  -e DBT_POSTGRES_TARGET_SCHEMA=nyc_tlc_record_data \
  -e DBT_POSTGRES_USER=postgres \
  -e DBT_POSTGRES_PASSWORD=postgres \
  --name dbt-postgres \
  dbt-postgres

TODO's:

  • PEP-517: Packaging and dependency management with uv
  • Bootstrap dbt with PostgreSQL Adapter (dbt-postgres)
  • Generate and serve docs and Data Lineage Graphs locally
  • Add dbt macro to configure target schemas dinamically
  • Run dbt-core in Docker