Conversation
EmptyScan was a special-case node for producing empty partitions. Replace all usages with InMemoryScan (with size_bytes=0), which has the same semantics and simplifies the plan node hierarchy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR removes the
Both replacement approaches are correct: Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Empty Partition Scenarios] --> B[limit.rs: All rows skipped]
A --> C[scan_source.rs: No scan tasks]
B -->|Before| D[EmptyScan plan node]
B -->|After| E[InMemoryScan with size_bytes=0]
C -->|Before| F[EmptyScan plan node]
C -->|After| G[PhysicalScan with empty scan task list]
D --> H[EmptyScanSource]
E --> I[InMemorySource]
G --> J[ScanTaskSource]
H -->|Produces| K[Empty MicroPartition]
I -->|Produces| K
J -->|Produces| K
style D fill:#ffcccc
style F fill:#ffcccc
style H fill:#ffcccc
style E fill:#ccffcc
style G fill:#ccffcc
Last reviewed commit: f9af1f4 |
Additional Comments (2)
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6275 +/- ##
==========================================
+ Coverage 73.44% 73.45% +0.01%
==========================================
Files 1003 1002 -1
Lines 133436 133118 -318
==========================================
- Hits 98005 97788 -217
+ Misses 35431 35330 -101
🚀 New features to boost your workflow:
|
|
@greptileai limit and scan source now use |
|
Exactly! The refactoring elegantly delegates empty partition handling to the existing pipeline sources. How it works:
What happens downstream: Looking at if partitions.is_empty() {
let empty = Arc::new(MicroPartition::empty(Some(schema)));
let _ = sender.send(empty).await;
}So instead of needing a dedicated
The distributed layer just needs to pass empty vectors, and the local execution layer handles producing the appropriate empty results. |
|
@greptileai re-review |
srilman
left a comment
There was a problem hiding this comment.
The PR looks good, but I'm a bit confused about the purpose. Why would this help with plan reuse?
e.g. if there are existing tasks post-limit, we can reuse them. |
on second thought, u might be right, if there were existing tasks post-limit, why would we need to emit this task to produce an empty mp anyway? |
Ah no ok i remember the REAL reason for this now. Back in the streaming sources PR we made ScanTaskSource pull from a receiver<(scantask, inputid)> (likewise for inmemory and glob), but not empty scan. I didn't think it was necessary to introduce the receiver and inputid plumbing into empty scan, so i opted to just remove it. |
Summary
Remove the
EmptyScanplan node variant, instead in favor ofInMemoryScanorPhysicalScanwith 0 inputs. This is so that we will be able to reuse a streaming swordfish task.