feat: Implement shuffle map side spill support (not flight shuffle)#6133
feat: Implement shuffle map side spill support (not flight shuffle)#6133caican00 wants to merge 1 commit intoEventual-Inc:mainfrom
Conversation
Greptile OverviewGreptile SummaryThis PR implements map-side spill support for shuffle operations in the Daft execution engine. The key change is enabling partitions to contain multiple Ray ObjectRefs instead of a single ref, allowing the shuffle operator to incrementally emit chunks when memory thresholds are exceeded. Major Changes:
Implementation Quality:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant RepartitionNode
participant State as RepartitionState
participant TaskSet as OrderingAwareJoinSet
participant Ray as Ray ObjectStore
Client->>RepartitionNode: Input MicroPartitions
RepartitionNode->>State: new(num_partitions)
loop For each input batch
RepartitionNode->>TaskSet: spawn partition task
TaskSet->>TaskSet: partition_by_hash/random/range
TaskSet-->>RepartitionNode: partitioned Vec<MicroPartition>
RepartitionNode->>State: push(partitioned)
State->>State: current_size_bytes += part.size_bytes()
alt current_size_bytes >= shuffle_spill_threshold
State->>State: flush_state()
State->>State: MicroPartition::concat per partition
State-->>RepartitionNode: Vec<Arc<MicroPartition>>
RepartitionNode->>Ray: send outputs (as list[ObjectRef])
State->>State: clear() & reset size_bytes
end
end
RepartitionNode->>State: Final flush if not empty
State->>State: MicroPartition::concat per partition
State-->>RepartitionNode: Vec<Arc<MicroPartition>>
RepartitionNode->>Ray: send final outputs (as list[ObjectRef])
Ray-->>Client: RayMaterializedResult(list[ObjectRef])
Last reviewed commit: 5f6e8f8 |
1bc2920 to
949b78d
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6133 +/- ##
==========================================
- Coverage 73.32% 73.13% -0.20%
==========================================
Files 994 994
Lines 129287 129827 +540
==========================================
+ Hits 94799 94947 +148
- Misses 34488 34880 +392
🚀 New features to boost your workflow:
|
22af7f9 to
0fca246
Compare
|
Hi @srilman could you help review this pr?Thank you |
1fa45d2 to
8a2f978
Compare
9c6f278 to
bcb2cda
Compare
bcb2cda to
5f6e8f8
Compare
|
@greptile |
42e4afe to
496eae1
Compare
496eae1 to
fbda57b
Compare
|
hey @caican00! this looks like a really good change that I do want to get in soon, but can I ask you to put a pause on this for the time being? I ask because we're restructuring a lot of shuffle infra for Flight shuffle support, which will make it easier to implement PRs like this in the future. Once that's ready, can you start looking into this again? |
Changes Made
Related Issues