upstream-prod is a dbt package for easily using production data in a development environment. It's like an alternative to the defer flag - only without the need to find and download a production manifest.
It is inspired by (but unrelated to) similar work by Monzo.
When developing locally, I would often get errors because my dev database had outdated or missing data. Rebuilding the models was often time-consuming and it wasn't easy to tell how many layers deep I needed to go.
upstream-prod fixes this by overriding the {{ ref() }} macro, redirecting refs in selected models to their equivalent table in production (unless the parent model was also selected).
For example, imagine a simple DAG like:
model_1 -> model_2 -> model_3
With a profiles.yml similar to:
jaffle_shop:
target: dev
outputs:
dev:
...
prod:
...If I run dbt build -s model_2+, the following happens:
model_2is built indevwith data from theprodversion ofmodel_1.model_3is also built in mydevbut it refers to thedevversion ofmodel_2.- Tests are run against the
devversions ofmodel_2andmodel_3.
In short, your selected models are available in your dev environment with all that lovely prod quality!
If desired, the package can return a dev model when the prod version can't be found. This is useful when adding several new models at once.
Let's assume only model_1 is in prod and I want to create model_2 and model_3. First I create model_2 in dev with dbt run -s model_2, then start working on model_3. By default, dbt run -s model_3 would fail because model_2 doesn't exist in prod. With fallback mode enabled, the package would use the dev model and the command would be successful.
After installing the package you'll need to add some variables to dbt_project.yml.
Set either of upstream_prod_schema or upstream_prod_database to tell the package where to find production data. Open profiles.yml and set the relevant variables for your setup:
- If dev and prod are in different schemas in the same database, set
upstream_prod_schemato theschemavalue of your prod target. - If dev and prod are in different logical databases, set
upstream_prod_databaseto thedatabasevalue of your prod target.
upstream_prod_enabled: Disables the package when False. Defaults to True.upstream_prod_disabled_targets: List of targets where the package should be disabled.upstream_prod_fallback: Whether to fall back to the default target when a model can't be found in prod. Defaults to False.
Example
I use Snowflake and each developer has a separate database with identically-named schemas. Here's how I have configured my project:
# dbt_project.yml
vars:
upstream_prod_database: <prod_db>
upstream_prod_fallack: True
upstream_prod_disabled_targets:
- ci
- prodNext, you need to tell dbt to use this package's version of {{ ref() }} instead of the builtin macro. The recommended approach is to create a thin wrapper around upstream-prod. In your macros directory, create a file called ref.sql with the following contents:
{% macro ref(
parent_model,
current_model=this.name,
prod_database=var("upstream_prod_database", None),
prod_schema=var("upstream_prod_schema", None),
enabled=var("upstream_prod_enabled", True),
fallback=var("upstream_prod_fallback", False)
) %}
{% do return(upstream_prod.ref(parent_model, current_model, prod_database, prod_schema, enabled, fallback)) %}
{% endmacro %}Alternatively, you can find any instances of {{ ref() }} in your project and replace them with {{ upstream_prod.ref() }}
upstream-prod has been designed and tested on Snowflake only. It should work with other officially supported adapters but I can't be sure. If it doesn't, PRs are welcome!