-
Notifications
You must be signed in to change notification settings - Fork 303
Feat: Check freshness for mixed external & sqlmesh models #5499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# filters: | ||
# branches: | ||
# only: | ||
# - main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Reinstate before merging
sqlmesh/core/scheduler.py
Outdated
if parent.snapshot_id not in snapshot_intervals: | ||
continue | ||
_, p_intervals = snapshot_intervals[parent.snapshot_id] | ||
parent_intervals.append(p_intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to construct this. The missing intervals will change as we apply signal checks to snapshots. If you set them once here, the parent signals will not be reflected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
I guess our best choice would be to check snapshot_batches
then (same loop) which applies the signal to filter out the unready intervals.
So, for each snapshot in the DAG, its ExecutionContext::parent_intervals
list will be extended by the snapshot_batches[parent]
, provided the parent
snapshot is NOT external.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this sounds correct, assuming I understood the idea correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 9a8af2e
46715f5
to
9a8af2e
Compare
|
||
parent_intervals: Intervals = [] | ||
for parent_id in snapshot.parents: | ||
parent_snapshot, _ = snapshot_intervals.get(parent_id, (None, [])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that snapshot_intervals
reflects updated intervals after signals were applied to parents. If I'm reading the code correctly, snapshot_intervals
remain unchanged and only snapshot_batches
reflect signal outcomes.
# The mismatch can happen if e.g an external model is not registered in the project | ||
|
||
upstream_parent_snapshots = {p for p in parent_snapshots if not p.is_external} | ||
external_parents = snapshot.node.depends_on - {p.name for p in upstream_parent_snapshots} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry man, I'm still confused. external_parents
contains external tables that were not registered as external models. Both upstream_parent_snapshots
and context.parent_intervals
exclude registered external models. So at which point do we check the last modified timestamp for tables that are registered external models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe external_models
should include both:
snapshot.node.depends_on - {p.name for p in upstream_parent_snapshots}
AND
{p.name for p in parent_snapshots if p.is_external}
am I missing something?
Follow up PR on model freshness (Previous PR)
This PR enables mixed SQLMesh & external (source) model freshness, e.g:
Model
C
will be evaluated if its upstream models are fresh, which is determined by:C
'slast_altered_ts
compared to external model's (A
)INFORMATION_SCHEMA
metadataB
), signalling if any model will be evaluated as part of this runA new freshness-specific test file is added to break up the original test in various scenarios.