feat(swordfish): plan caching with ActivePlansRegistry#6278
Open
colin-ho wants to merge 3 commits intocolin/pipeline-messagefrom
Open
feat(swordfish): plan caching with ActivePlansRegistry#6278colin-ho wants to merge 3 commits intocolin/pipeline-messagefrom
colin-ho wants to merge 3 commits intocolin/pipeline-messagefrom
Conversation
Add a fingerprint() method that computes a structural hash of the plan tree for plan caching. Two plans with identical structure (same operators, expressions, schemas) produce the same fingerprint, enabling pipeline reuse across multiple executions of the same logical plan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire plan caching by fingerprint in NativeExecutor, enabling pipeline reuse across multiple executions of the same logical plan. Multiple input_ids route through a single shared pipeline via MessageRouter. Key changes: - ActivePlansRegistry caches plan pipelines by fingerprint+query_id - PlanState tracks active input_ids and manages pipeline lifecycle - MessageRouter routes PipelineMessage outputs to per-input-id channels - try_finish() API for callers to signal input completion - Python integration: native_executor.py and flotilla.py call try_finish() - Fingerprint tests for plan reuse and isolation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Greptile SummaryThis PR implements plan caching in the native executor, allowing multiple executions of the same logical plan to share a single pipeline via fingerprint-based caching. The implementation includes:
Issue found: The Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Python: executor.run] -->|1. compute fingerprint| B[NativeExecutor::run]
B -->|2. check registry| C{Plan exists<br/>for fingerprint?}
C -->|No| D[Create new pipeline]
D -->|spawn task| E[Pipeline execution task]
C -->|Yes| F[Reuse existing pipeline]
F -->|get sender| G[EnqueueInputMessage]
D -->|get sender| G
G -->|3. send inputs| E
E -->|4. route outputs| H[MessageRouter]
H -->|by input_id| I[Per-input channel]
I -->|5. stream results| J[Python iterator]
J -->|6. try_finish| K{Last input_id<br/>for plan?}
K -->|Yes| L[Remove from registry<br/>await final stats]
K -->|No| M[Keep plan active<br/>return snapshot]
L -->|stats| N[Return ExecutionEngineFinalResult]
M -->|stats| N
style E fill:#e1f5ff
style H fill:#e1f5ff
style C fill:#fff4e1
style K fill:#fff4e1
Last reviewed commit: fb180dc |
Comment on lines
+170
to
+192
| Self::UDFProject(UDFProject { | ||
| expr, | ||
| udf_properties, | ||
| passthrough_columns, | ||
| schema, | ||
| .. | ||
| }) => { | ||
| // Hash UDF properties (excluding any RuntimePyObject) | ||
| udf_properties.name.hash(hasher); | ||
| udf_properties.resource_request.hash(hasher); | ||
| udf_properties.batch_size.hash(hasher); | ||
| udf_properties.concurrency.hash(hasher); | ||
| udf_properties.use_process.hash(hasher); | ||
| udf_properties.max_retries.hash(hasher); | ||
| udf_properties.builtin_name.hash(hasher); | ||
| udf_properties.is_async.hash(hasher); | ||
| udf_properties.is_scalar.hash(hasher); | ||
| udf_properties.on_error.hash(hasher); | ||
| for expr in passthrough_columns { | ||
| expr.hash(hasher); | ||
| } | ||
| schema.hash(hasher); | ||
| } |
Contributor
There was a problem hiding this comment.
missing hash for expr field in UDFProject fingerprinting - the expr field is extracted but never hashed, only UDF properties and passthrough columns are hashed
Suggested change
| Self::UDFProject(UDFProject { | |
| expr, | |
| udf_properties, | |
| passthrough_columns, | |
| schema, | |
| .. | |
| }) => { | |
| // Hash UDF properties (excluding any RuntimePyObject) | |
| udf_properties.name.hash(hasher); | |
| udf_properties.resource_request.hash(hasher); | |
| udf_properties.batch_size.hash(hasher); | |
| udf_properties.concurrency.hash(hasher); | |
| udf_properties.use_process.hash(hasher); | |
| udf_properties.max_retries.hash(hasher); | |
| udf_properties.builtin_name.hash(hasher); | |
| udf_properties.is_async.hash(hasher); | |
| udf_properties.is_scalar.hash(hasher); | |
| udf_properties.on_error.hash(hasher); | |
| for expr in passthrough_columns { | |
| expr.hash(hasher); | |
| } | |
| schema.hash(hasher); | |
| } | |
| Self::UDFProject(UDFProject { | |
| expr, | |
| udf_properties, | |
| passthrough_columns, | |
| schema, | |
| .. | |
| }) => { | |
| // Hash the expression | |
| expr.hash(hasher); | |
| // Hash UDF properties (excluding any RuntimePyObject) | |
| udf_properties.name.hash(hasher); | |
| udf_properties.resource_request.hash(hasher); | |
| udf_properties.batch_size.hash(hasher); | |
| udf_properties.concurrency.hash(hasher); | |
| udf_properties.use_process.hash(hasher); | |
| udf_properties.max_retries.hash(hasher); | |
| udf_properties.builtin_name.hash(hasher); | |
| udf_properties.is_async.hash(hasher); | |
| udf_properties.is_scalar.hash(hasher); | |
| udf_properties.on_error.hash(hasher); | |
| for expr in passthrough_columns { | |
| expr.hash(hasher); | |
| } | |
| schema.hash(hasher); | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the final PR in the series that wires everything together. Multiple executions of the same logical plan now share a single pipeline via fingerprint-based caching.
Depends on: #6276 (Plan Fingerprinting) + #6277 (PipelineMessage)
Test plan
🤖 Generated with Claude Code