Skip to content

thesis/mezo-dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DBT DataWarhouse Transformations for Mezo

Setup the dbt project locally

Prerequisites

Clone the Repository

   git clone https://github.com/thesis/mezo-dbt
   cd mezo-dbt

Install Dependencies

  • Install uv
  • Install Python dependencies (with uv):
   uv sync
   source .venv/bin/activate #Activate the venv

Configure dbt profile.yml locally

  • Create a .dbt folder in your home directory if it doesn’t exist:
   mkdir ~/.dbt
   touch ~/.dbt/profiles.yml
   code ~/.dbt/profiles.yml ## https://code.visualstudio.com/docs/configure/command-line#_launching-from-command-line
   ##Or open with vim if you know how to close it.
   vim ~/.dbt/profiles.yml ## or
  • Edit profiles.yml file inside it with your BigQuery configuration:
   mezo:
      outputs:
         dev:
            type: bigquery
            method: oauth
            project: <your-gcp-project-id>
            dataset: dbt_yourname
            location: EU
            threads: 4
   target: dev
  • Authenticate with gcloud (creates local credentials JSON automatically):
   gcloud auth application_default_login

Test your setup

   dbt debug

Install DBT Dependencies

   dbt deps

For other dbt commands check: https://docs.getdbt.com/reference/dbt-commands

This projects uses pre-commit

To run checks locally use:

   pre-commit run --all-files --config .pre-commit-config_local.yaml

Use the following hook to run checks before commit:

INSTALL_PYTHON=/Users/benedikt/Documents/gitrepos/crfe-orc-cloud-composer/.venv/bin/python3
ARGS=(hook-impl --config=.pre-commit-config_local.yaml --hook-type=pre-commit)

# end templated

HERE="$(cd "$(dirname "$0")" && pwd)"
ARGS+=(--hook-dir "$HERE" -- "$@")

if [ -x "$INSTALL_PYTHON" ]; then
    exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
elif command -v pre-commit > /dev/null; then
    exec pre-commit "${ARGS[@]}"
else
    echo '`pre-commit` not found.  Did you forget to activate your virtualenv?' 1>&2
    exit 1
fi

How to Set Up a Goldsky Table

To set up a new table using Goldsky data in BigQuery:

Contact Goldsky Support: Email Goldsky to request the setup of a new table to be imported into the mezo-prod-dp-dwh-lnd-goldsky-cs-0 Google Cloud Storage (GCS) bucket. As of this writing, the Goldsky documentation is limited, and self-service setup is not available—you must contact support to establish the connection.

Organize Data in GCS

  • For each import, create a separate folder in the GCS bucket.
    • The folder structure should follow this pattern: event_type=<event_type>/event_date=<YYYY-MM-DD>/ (e.g., event_type=donated/event_date=2025-05-22/).
    • This structure enables Hive partitioning of the table. For more details, see the BigLake partitioned data documentation.

Update dbt Source Configuration

  • Edit the models/00_sources/goldsky.yml file to add the new table definition.
  • Use the existing configurations in the file as a template for your new entry as a reference.
  • Ensure all relevant metadata, columns, and partitioning information are included.

Register the Table in BigQuery

  • The table will be created in BigQuery using the dbt-external-tables package.

  • After updating the YAML file, run the following dbt command to create the external tables:

    dbt run-operation stage_external_sources
  • This command will register the external tables in BigQuery based on your configuration. This is automatically run during deployment and CI Process.

Update the External Table in BigQuery

If the source file (e.g., Google Sheet) changes structure:

  • Edit the corresponding YAML file in models/00_sources/

  • Adjust schema, columns, or partitioning as needed.

  • Re-stage the external table

    dbt run-operation stage_external_sources
  • These steps are also run automatically via GitHub Actions, but for local testing, you must run them manually.

📖 Documentation

This project includes autogenerated dbt documentation, hosted with GitHub Pages. 👉 View the dbt docs The documentation site is automatically updated via GitHub Actions when changes are merged into the repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published