- Install gcloud
- VSCode or any other code editor
- Optional: dbt power user for vscode
git clone https://github.com/thesis/mezo-dbt
cd mezo-dbt
- Install uv
- Install Python dependencies (with uv):
uv sync
source .venv/bin/activate #Activate the venv
Configure dbt profile.yml locally
- Create a .dbt folder in your home directory if it doesn’t exist:
mkdir ~/.dbt
touch ~/.dbt/profiles.yml
code ~/.dbt/profiles.yml ## https://code.visualstudio.com/docs/configure/command-line#_launching-from-command-line
##Or open with vim if you know how to close it.
vim ~/.dbt/profiles.yml ## or
- Edit profiles.yml file inside it with your BigQuery configuration:
mezo:
outputs:
dev:
type: bigquery
method: oauth
project: <your-gcp-project-id>
dataset: dbt_yourname
location: EU
threads: 4
target: dev
- Authenticate with gcloud (creates local credentials JSON automatically):
gcloud auth application_default_login
dbt debug
dbt deps
For other dbt commands check: https://docs.getdbt.com/reference/dbt-commands
This projects uses pre-commit
To run checks locally use:
pre-commit run --all-files --config .pre-commit-config_local.yaml
Use the following hook to run checks before commit:
INSTALL_PYTHON=/Users/benedikt/Documents/gitrepos/crfe-orc-cloud-composer/.venv/bin/python3
ARGS=(hook-impl --config=.pre-commit-config_local.yaml --hook-type=pre-commit)
# end templated
HERE="$(cd "$(dirname "$0")" && pwd)"
ARGS+=(--hook-dir "$HERE" -- "$@")
if [ -x "$INSTALL_PYTHON" ]; then
exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
elif command -v pre-commit > /dev/null; then
exec pre-commit "${ARGS[@]}"
else
echo '`pre-commit` not found. Did you forget to activate your virtualenv?' 1>&2
exit 1
fi
To set up a new table using Goldsky data in BigQuery:
Contact Goldsky Support: Email Goldsky to request the setup of a new table to be imported into the mezo-prod-dp-dwh-lnd-goldsky-cs-0
Google Cloud Storage (GCS) bucket. As of this writing, the Goldsky documentation is limited, and self-service setup is not available—you must contact support to establish the connection.
- For each import, create a separate folder in the GCS bucket.
- The folder structure should follow this pattern:
event_type=<event_type>/event_date=<YYYY-MM-DD>/
(e.g.,event_type=donated/event_date=2025-05-22/
). - This structure enables Hive partitioning of the table. For more details, see the BigLake partitioned data documentation.
- The folder structure should follow this pattern:
- Edit the models/00_sources/goldsky.yml file to add the new table definition.
- Use the existing configurations in the file as a template for your new entry as a reference.
- Ensure all relevant metadata, columns, and partitioning information are included.
-
The table will be created in BigQuery using the dbt-external-tables package.
-
After updating the YAML file, run the following dbt command to create the external tables:
dbt run-operation stage_external_sources
-
This command will register the external tables in BigQuery based on your configuration. This is automatically run during deployment and CI Process.
If the source file (e.g., Google Sheet) changes structure:
-
Edit the corresponding YAML file in models/00_sources/
-
Adjust schema, columns, or partitioning as needed.
-
Re-stage the external table
dbt run-operation stage_external_sources
-
These steps are also run automatically via GitHub Actions, but for local testing, you must run them manually.
This project includes autogenerated dbt documentation, hosted with GitHub Pages. 👉 View the dbt docs The documentation site is automatically updated via GitHub Actions when changes are merged into the repository.