Skip to content

Commit 720fc9c

Browse files
authored
Merge pull request #157 from DataRecce/recceintroblog
Added new blog about data change management
2 parents 760811d + 08841cf commit 720fc9c

File tree

9 files changed

+147
-0
lines changed

9 files changed

+147
-0
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
title: Recce - Your data change management toolkit
3+
date: 2025-02-17
4+
slug: recce-data-change-management-toolkit
5+
description: >
6+
With Recce you’re able to validate your data modeling changes against a known-good baseline, comparing datasets before and after your modifications, in a risk-free environment. And there’s a diff for every occasion.
7+
categories:
8+
- Data Change Management
9+
- Impact assessment
10+
- Data Validation
11+
tags:
12+
- data validaton
13+
- dbt
14+
- best practices
15+
- code review
16+
- impact assessment
17+
---
18+
19+
# Recce: Your data change management toolkit
20+
21+
Whether you’re the author of a pull request or the one reviewing it, you’ve got a tough job: figuring out what changed, verifying that the PR does what it’s supposed to, and making sure nothing breaks in production. In large or business-critical dbt projects, this can be a slow, frustrating process. That’s why we built Recce - an open-source toolkit that’s here to make your data modeling validation and pull request (PR) reviews a breeze.
22+
23+
24+
<figure markdown="span">
25+
![Build the ultimate PR comment to validate your dbt data modeling changes](../assets/images/data-change-management-toolkit-2025-02-17/self-serve-review.png)
26+
<figcaption>Build the ultimate PR comment to validate your dbt data modeling changes</figcaption>
27+
</figure>
28+
29+
30+
31+
## What is Recce?
32+
33+
Recce (pronounced “reh-kee”, short for “reconnaissance”) is a suite of change management tools designed to help you compare dbt environments, assess data impacts, and streamline your PR reviews. Recce gives you visibility into the effects of your data modeling changes *before* they hit production. With Recce, you can **take two dbt environments, such as dev and prod, and compare them using the suite of diff tools.**
34+
35+
<!-- more -->
36+
37+
## Your diffing toolkit
38+
39+
With Recce you’re able to validate your data modeling changes against a known-good baseline, comparing datasets before and after your modifications, in a risk-free environment. And there’s a diff for every occasion.
40+
41+
### Lineage DAG diff
42+
43+
Start from the zone of impact of your changes, and see which models have been modified, added, and removed. Unlike the dbt docs lineage DAG, which only shows you the current state of the DAG, Recce shows you how the DAG differs from both before and after your changes.
44+
45+
46+
47+
<figure markdown="span">
48+
![Lineage Diff in Recce](../assets/images/data-change-management-toolkit-2025-02-17/recce-lineage-diff.gif)
49+
<figcaption>See modified, added, and removed dbt models with breaking change analysis</figcaption>
50+
</figure>
51+
52+
53+
### Data profile diff and value diff
54+
55+
Perform holistic checks by diffing the data profile stats for your development branch, then check the percentage of values matching for each column in a model.
56+
57+
<figure markdown="span">
58+
![Perform holistic checks by diffing data profile stats](../assets/images/data-change-management-toolkit-2025-02-17/profile-diff.png)
59+
<figcaption>Perform holistic checks by diffing data profile stats</figcaption>
60+
</figure>
61+
62+
63+
64+
### Query diff
65+
66+
If something needs further investigation, drill down and query the data. One query will run on both environments, and you’ll be able to see the difference on a row-by-row basis. Enable change-only view to see just what’s changed.
67+
68+
<figure markdown="span">
69+
![Drill down and query the data](../assets/images/data-change-management-toolkit-2025-02-17/query-diff.png)
70+
<figcaption>Drill down and query the data</figcaption>
71+
</figure>
72+
73+
74+
## Schema and row count
75+
76+
In addition to the above diffs, you can also check the schema and row count, just to be sure you didn’t lose any data, or an important column.
77+
78+
<figure markdown="span">
79+
![Ensure data integrity with schema and row count checks](../assets/images/data-change-management-toolkit-2025-02-17/schema-row.png)
80+
<figcaption>Ensure data integrity with schema and row count checks</figcaption>
81+
</figure>
82+
83+
84+
## You’ve been hard at work, time to show it
85+
86+
As you create validations in Recce, you can add them to your curated checklist with notes about what you found, and re-re-run checks if the data changes.
87+
88+
89+
<figure markdown="span">
90+
![Curate a data validation checklist](../assets/images/data-change-management-toolkit-2025-02-17/checklist.png)
91+
<figcaption>Curate a data validation checklist</figcaption>
92+
</figure>
93+
94+
Once you’ve validated your changes, it’s time to share your work. Recce lets you export your checks directly into your [PR comment template](https://medium.com/inthepipeline/use-this-updated-pull-request-comment-template-for-your-dbt-data-projects-de06f12fc38d), so you can provide clear, proof-of-correctness evidence. You can copy key notes, grab a screenshot of the validation results, and include only the relevant details, keeping your PR comment **all-signal, no noise**.
95+
96+
For reviewers, this means they can quickly see the queries and results of your data spot-checks, making it easy to assess the impact of your changes. With all the context at hand, they can either ask for further investigation or confidently approve the PR.
97+
98+
99+
<figure markdown="span">
100+
![Level up your PR comments](../assets/images/data-change-management-toolkit-2025-02-17/recce-pr-comment-example.png)
101+
<figcaption>Level up your PR comments</figcaption>
102+
</figure>
103+
104+
## Getting started with Recce
105+
106+
Ready to revolutionize your data review process? Recce is open-source and easy to integrate into your workflow.
107+
108+
Recce OSS is available on GitHub. Follow the instructions in our Getting Started guide to start using Recce to validate your data modeling changes.
109+
110+
- **GitHub**: [DataRecce/Recce](https://github.com/datarecce/recce)
111+
- **Docs**: [DataRecce.io/docs](https://datarecce.io/docs)
112+
- **Discord**: [Recce Community](https://discord.gg/bP2Yfk9KEA)
113+
114+
### Try Recce Online
115+
116+
If you want to try Recce out without having to install, check out our demo instance.
117+
118+
### Demo
119+
120+
The [demo PR](https://github.com/DataRecce/jaffle_shop_duckdb/pull/1) makes a simple change to the dbt’s Jaffle Shop project and changes how `customer_lifetime_value` (CLV) is calculated by fixing the calculation to only evaluate *completed* orders.
121+
122+
<figure markdown="span">
123+
![Code diff in Recce](../assets/images/data-change-management-toolkit-2025-02-17/code-diff.png)
124+
<figcaption>Code Diff in Recce</figcaption>
125+
</figure>
126+
127+
128+
129+
The expectation from this change is that CLV will be reduced overall, and that this will also impact the customer segments downstream model. With that in mind, see if you can determine if the if the PR has any issues by checking the data in Recce:
130+
131+
- **The PR:** [https://github.com/DataRecce/jaffle_shop_duckdb/pull/1](https://github.com/DataRecce/jaffle_shop_duckdb/pull/1)
132+
- **Recce Demo instance**: [https://pr1.cloud.datarecce.io/](https://pr1.cloud.datarecce.io/)
133+
134+
*Hint: Run a Profile Diff, then a Query Diff, on the customers model. Then check for downstream impact.*
135+
136+
For even more details on using Recce to perform data impact assessment, check out our [hands-on guide](https://medium.com/inthepipeline/hands-on-data-impact-analysis-with-recce-80ea4156c6ec).
137+
138+
### Start shipping data models with confidence with Recce
139+
140+
Data modeling changes shouldn’t feel like a gamble. Whether you’re the one writing the PR or the one reviewing it, you need confidence that what’s changing is actually what was intended —without breaking production. Recce gives you the tools to compare environments, validate your data, and surface meaningful insights, all while keeping PR comments focused and actionable.
141+
142+
If you’re tired of [slow QA cycles](https://medium.com/inthepipeline/dbt-best-practices-are-in-but-merge-times-are-up-49f72a792680), silent data errors, and [bloated CI pipelines](https://medium.com/inthepipeline/so-you-think-youve-got-dbt-test-bloat-37491fb330d5), it’s time to give Recce a shot.
143+
144+
145+
146+
147+

0 commit comments

Comments
 (0)