This project is meant for experimenting with dbt
and the dbt-postgres
adapter for Analytics,
using NYC TLC Trip Record dataset as the datasource, with Kimball dimensional modeling technique.
NOTE: This is NOT meant for production at scale, but rather for educational purposes. Consider using BigQuery
, Snowflake
, StarRocks
, ClickHouse
, Databricks
or RedShift
instead.
1. Install dependencies from pyproject.toml and activate the created virtualenv:
uv sync && source .venv/bin/activate
2. (Optional) Install pre-commit:
brew install pre-commit
# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install
3. Setup dbt profiles.yaml accordingly (use the profiles.tmpl.yaml
as template)
3.1. By default, the profiles_dir is the user '$HOME/.dbt/'
mkdir -p ~/.dbt/
cat profiles.tmpl.yml >> ~/.dbt/profiles.yml
3.2. Set the environment variables for dbt-postgres
:
export DBT_POSTGRES_HOST=localhost
export DBT_POSTGRES_PORT=5432
export DBT_POSTGRES_DATABASE=nyc_taxi
export DBT_POSTGRES_SOURCE_SCHEMA=public
export DBT_POSTGRES_TARGET_SCHEMA=nyc_tlc_record_data
export DBT_POSTGRES_USER=postgres
export DBT_POSTGRES_PASSWORD=postgres
4. Install dbt dependencies and trigger the pipeline
4.1. Run dbt deps
to install dbt plugins
dbt deps
4.2. Run dbt seed
to push/create the tables from the .csv seed files to the target schema
dbt seed
4.3. Run dbt run to trigger the dbt models to run
dbt build --target [prod|dev]
# Alternatively you can run only a subset of the models with:
## +models/staging: Runs the dependencies/preceding models first that lead
## to 'models/staging', and then the target models
dbt [build|run] --select +models/staging --target [prod|dev]
## models/staging+: Runs the target models first, and then all models that depend on it
dbt [build|run] --select models/staging+ --target [prod|dev]
5. Generate the Docs and the Data Lineage graph with:
dbt docs generate
dbt docs serve
Access the generated docs at:
open http://localhost:8080
1. Build the Docker Image with:
docker build -t dbt-postgres:latest . --no-cache
2. Start a container with it:
docker run -d --rm \
-e DBT_POSTGRES_HOST=host.docker.internal \
-e DBT_POSTGRES_DATABASE=nyc_taxi \
-e DBT_POSTGRES_SOURCE_SCHEMA=public \
-e DBT_POSTGRES_TARGET_SCHEMA=nyc_tlc_record_data \
-e DBT_POSTGRES_USER=postgres \
-e DBT_POSTGRES_PASSWORD=postgres \
--name dbt-postgres \
dbt-postgres
- PEP-517: Packaging and dependency management with
uv
- Bootstrap dbt with PostgreSQL Adapter (dbt-postgres)
- Generate and serve docs and Data Lineage Graphs locally
- Add dbt macro to configure target schemas dinamically
- Run
dbt-core
in Docker