DBT DataWarhouse Transformations for Mezo

Setup the dbt project locally

Prerequisites

Install gcloud
VSCode or any other code editor
Optional: dbt power user for vscode

Clone the Repository

   git clone https://github.com/thesis/mezo-dbt
   cd mezo-dbt

Install Dependencies

Install uv
Install Python dependencies (with uv):

   uv sync
   source .venv/bin/activate #Activate the venv

Configure dbt profile.yml locally

Create a .dbt folder in your home directory if it doesn’t exist:

   mkdir ~/.dbt
   touch ~/.dbt/profiles.yml
   code ~/.dbt/profiles.yml ## https://code.visualstudio.com/docs/configure/command-line#_launching-from-command-line
   ##Or open with vim if you know how to close it.
   vim ~/.dbt/profiles.yml ## or

Edit profiles.yml file inside it with your BigQuery configuration:

   mezo:
      outputs:
         dev:
            type: bigquery
            method: oauth
            project: <your-gcp-project-id>
            dataset: dbt_yourname
            location: EU
            threads: 4
   target: dev

Authenticate with gcloud (creates local credentials JSON automatically):

   gcloud auth application_default_login

Test your setup

   dbt debug

Install DBT Dependencies

   dbt deps

For other dbt commands check: https://docs.getdbt.com/reference/dbt-commands

This projects uses pre-commit

To run checks locally use:

   pre-commit run --all-files --config .pre-commit-config_local.yaml

Use the following hook to run checks before commit:

INSTALL_PYTHON=/Users/benedikt/Documents/gitrepos/crfe-orc-cloud-composer/.venv/bin/python3
ARGS=(hook-impl --config=.pre-commit-config_local.yaml --hook-type=pre-commit)

# end templated

HERE="$(cd "$(dirname "$0")" && pwd)"
ARGS+=(--hook-dir "$HERE" -- "$@")

if [ -x "$INSTALL_PYTHON" ]; then
    exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
elif command -v pre-commit > /dev/null; then
    exec pre-commit "${ARGS[@]}"
else
    echo '`pre-commit` not found.  Did you forget to activate your virtualenv?' 1>&2
    exit 1
fi

How to Set Up a Goldsky Table

To set up a new table using Goldsky data in BigQuery:

Contact Goldsky Support: Email Goldsky to request the setup of a new table to be imported into the mezo-prod-dp-dwh-lnd-goldsky-cs-0 Google Cloud Storage (GCS) bucket. As of this writing, the Goldsky documentation is limited, and self-service setup is not available—you must contact support to establish the connection.

Organize Data in GCS

For each import, create a separate folder in the GCS bucket.
- The folder structure should follow this pattern: event_type=<event_type>/event_date=<YYYY-MM-DD>/ (e.g., event_type=donated/event_date=2025-05-22/).
- This structure enables Hive partitioning of the table. For more details, see the BigLake partitioned data documentation.

Update dbt Source Configuration

Edit the models/00_sources/goldsky.yml file to add the new table definition.
Use the existing configurations in the file as a template for your new entry as a reference.
Ensure all relevant metadata, columns, and partitioning information are included.

Register the Table in BigQuery

The table will be created in BigQuery using the dbt-external-tables package.
After updating the YAML file, run the following dbt command to create the external tables:
```
dbt run-operation stage_external_sources
```
This command will register the external tables in BigQuery based on your configuration. This is automatically run during deployment and CI Process.

Update the External Table in BigQuery

If the source file (e.g., Google Sheet) changes structure:

Edit the corresponding YAML file in models/00_sources/
Adjust schema, columns, or partitioning as needed.

Re-stage the external table

dbt run-operation stage_external_sources

These steps are also run automatically via GitHub Actions, but for local testing, you must run them manually.

📖 Documentation

This project includes autogenerated dbt documentation, hosted with GitHub Pages. 👉 View the dbt docs The documentation site is automatically updated via GitHub Actions when changes are merged into the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
.github/workflows		.github/workflows
.vscode		.vscode
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-config_local.yaml		.pre-commit-config_local.yaml
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
.python-version		.python-version
.sqlfluff		.sqlfluff
.sqlfluff_dev		.sqlfluff_dev
.sqlfluffignore		.sqlfluffignore
Dockerfile		Dockerfile
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
profiles.yml		profiles.yml
pyproject.toml		pyproject.toml
script.sh		script.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DBT DataWarhouse Transformations for Mezo

Setup the dbt project locally

Prerequisites

Clone the Repository

Install Dependencies

Configure dbt profile.yml locally

Test your setup

Install DBT Dependencies

This projects uses pre-commit

How to Set Up a Goldsky Table

Organize Data in GCS

Update dbt Source Configuration

Register the Table in BigQuery

Update the External Table in BigQuery

📖 Documentation

About

Uh oh!

Releases

Packages

Contributors 3

Languages

thesis/mezo-dbt

Folders and files

Latest commit

History

Repository files navigation

DBT DataWarhouse Transformations for Mezo

Setup the dbt project locally

Prerequisites

Clone the Repository

Install Dependencies

Configure dbt profile.yml locally

Test your setup

Install DBT Dependencies

This projects uses pre-commit

How to Set Up a Goldsky Table

Organize Data in GCS

Update dbt Source Configuration

Register the Table in BigQuery

Update the External Table in BigQuery

📖 Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages