-
Notifications
You must be signed in to change notification settings - Fork 302
Feat: Check freshness for mixed external & sqlmesh models #5499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# filters: | ||
# branches: | ||
# only: | ||
# - main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Reinstate before merging
upstream_parent_snapshots = {p for p in parent_snapshots if not p.is_external} | ||
external_parents = snapshot.node.depends_on - {p.name for p in upstream_parent_snapshots} | ||
|
||
if context.parent_intervals: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should exclude external models from this check. There could be intervals for external models to, for example, run audits, but it doesn't mean that the table is actually fresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I guess I can exclude them at the time of parent_intervals
construction, since that's a list so we've lost the snapshot mapping of each Interval
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 9a8af2e, external snapshots are excluded from parent_intervals
.
sqlmesh/core/scheduler.py
Outdated
if parent.snapshot_id not in snapshot_intervals: | ||
continue | ||
_, p_intervals = snapshot_intervals[parent.snapshot_id] | ||
parent_intervals.append(p_intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to construct this. The missing intervals will change as we apply signal checks to snapshots. If you set them once here, the parent signals will not be reflected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
I guess our best choice would be to check snapshot_batches
then (same loop) which applies the signal to filter out the unready intervals.
So, for each snapshot in the DAG, its ExecutionContext::parent_intervals
list will be extended by the snapshot_batches[parent]
, provided the parent
snapshot is NOT external.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this sounds correct, assuming I understood the idea correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 9a8af2e
46715f5
to
9a8af2e
Compare
Follow up PR on model freshness (Previous PR)
This PR enables mixed SQLMesh & external (source) model freshness, e.g:
Model
C
will be evaluated if its upstream models are fresh, which is determined by:C
'slast_altered_ts
compared to external model's (A
)INFORMATION_SCHEMA
metadataB
), signalling if any model will be evaluated as part of this runA new freshness-specific test file is added to break up the original test in various scenarios.