Skip to content

execution: fix split scheduler preservation for late subscribers (P2300)#7165

Open
arpittkhandelwal wants to merge 6 commits intoTheHPXProject:masterfrom
arpittkhandelwal:fix/split-scheduler-preservation
Open

execution: fix split scheduler preservation for late subscribers (P2300)#7165
arpittkhandelwal wants to merge 6 commits intoTheHPXProject:masterfrom
arpittkhandelwal:fix/split-scheduler-preservation

Conversation

@arpittkhandelwal
Copy link
Copy Markdown
Contributor

Problem Statement

This PR addresses a P2300 correctness violation in the split algorithm. Previously, the split sender failed to preserve the associated scheduler when a receiver connected after the predecessor had already completed (the "late subscriber" scenario). In these cases, the completion signal was fired inline, bypassing the execution context guaranteed by the sender's completion scheduler.

Proposed Changes

  • Virtualized Completion Hooks: Introduced a virtual void schedule_completion(continuation_type&&) method to the shared_state base class. This allows subclasses to reroute completion signals through the appropriate execution context.

  • Scheduler-Aware shared_state: Implemented shared_state_scheduler, a new subclass that captures the attached scheduler. Overrode schedule_completion to dispatch stored continuations via hpx::execution::experimental::schedule(sched).

  • Safe Asynchronous Management: Implemented a self-owning schedule_op_holder to manage the lifetime of the schedule() operation state.

  • Memory Safety: Adopted the standard HPX allocator pattern, rebinding the shared_state allocator to handle internal task metadata without raw new/delete.

  • Race Prevention: Added an intrusive_ptr owner guard before calling start() to prevent use-after-free if a scheduler executes synchronously.

  • CPO & Dispatch Refactoring: Updated split_t overloads to support generic schedulers, enabling both automatic scheduler discovery and explicit injection.

  • Cleaned up constructor SFINAE in split_sender to handle no_scheduler, run_loop_scheduler, and generic Scheduler types without ambiguity.

Verification Results
New Test Suite: Added algorithm_split_scheduler.cpp which specifically targets the late-subscriber race condition.
Regression Testing: Verified that the legacy no_scheduler and run_loop paths remain unaffected.
Performance: Used HPX_NO_UNIQUE_ADDRESS and intrusive pointers to keep metadata overhead at an absolute minimum.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 11, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

When split(scheduler, sender) is used and the predecessor has already
completed (predecessor_done == true) by the time add_continuation() is
called by a new subscriber, the downstream completion signal was fired
inline on whatever thread called add_continuation(). This violated the
P2300 get_completion_scheduler<CPO> contract, which requires that
completions on the set_value and set_stopped signals are dispatched by
the scheduler passed to split.

The code itself acknowledged this with a TODO comment:
  // TODO: Should this preserve the scheduler? It does not
  // if we call set_* inline.

Fix:

* Add a virtual schedule_completion(continuation_type&&) method to
  shared_state with a default implementation that fires inline (preserving
  existing behaviour for the scheduler-free case).

* Replace the two "fire inline" paths inside add_continuation (the
  predecessor_done fast-path and the lock-then-done path) with calls to
  schedule_completion, so all completion dispatch goes through a single
  overridable hook.

* Add shared_state_scheduler<Sched> — a new subclass of shared_state —
  that overrides schedule_completion to post the continuation through
  schedule(sched). The operation state is kept alive via a self-owning
  intrusive_ptr-based holder (mirroring the pattern in start_detached.hpp),
  so the async lifetime is correct regardless of how quickly the thread
  pool processes the work item.

* Add a second constructor overload to split_sender for generic
  (non-run_loop) schedulers that allocates shared_state_scheduler instead
  of plain shared_state.

* Add algorithm_split_scheduler unit test that covers:
  - Basic split with no scheduler (regression guard)
  - split with thread_pool_scheduler: late subscriber receives value on
    the pool, not inline
  - Multiple concurrent late subscribers all receive the value
  - ensure_started (eager submission) is unaffected

No behavioural change for the scheduler-free split or the run_loop split;
only the generic-scheduler path gains the new subclass.

Signed-off-by: arpittkhandelwal <arpitkhandelwal810@gmail.com>
@arpittkhandelwal arpittkhandelwal force-pushed the fix/split-scheduler-preservation branch from 5d7786a to 4afbe1b Compare April 11, 2026 12:28
@StellarBot
Copy link
Copy Markdown
Collaborator

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-04-11T12:55:48+00:00
HPX Commit0eeca869deb045
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T09:15:24.034803-05:002026-04-11T08:05:19.438544-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch++

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-04-11T12:55:48+00:00
HPX Commit0eeca869deb045
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T09:17:15.638328-05:002026-04-11T08:07:02.860839-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)---
Stream Benchmark - Scale(=)----
Stream Benchmark - Triad(=)---
Stream Benchmark - Copy=+++++

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T18:50:37+00:002026-04-11T12:55:48+00:00
HPX Commitba89f5d9deb045
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T17:49:10.837937-05:002026-04-11T08:07:23.108751-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser hkaiser added category: senders/receivers Implementations of the p0443r14 / p2300 + p1897 proposals type: enhancement type: compatibility issue labels Apr 11, 2026
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 11, 2026

@arpittkhandelwal What's the difference to #6911?

@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

@arpittkhandelwal What's the difference to #6911?

I think the main difference between this PR and #6911 is all about where the 'source of truth' for the scheduler lives. PR #6911 takes a receiver-centric approach—it essentially looks downstream at the moment of connection to see if the receiver has a preferred scheduler. This is a nice safety net, but it doesn't quite address the core problem: the split(scheduler, sender) call itself is currently ignoring the scheduler we specifically gave it whenever a subscriber arrives late.
In this PR, I’ve gone with a sender-centric approach to make sure we actually follow the algorithm's contract. By virtualizing the completion dispatch through a schedule_completion hook in the shared state, we get a few big wins:

Better Compliance: The split_sender now strictly follows the P2300 contract by completing on the provided scheduler, even if the receiver doesn't have an environment or a specific scheduler of its own.
Proper Discovery: It allows the sender to correctly report its scheduler through get_completion_scheduler.
Efficiency: Instead of using start_detached for every late subscriber (which adds extra allocations), we use a specialized shared_state_scheduler. This fits much better into the existing HPX shared-state patterns and keeps things efficient.
Basically, while #6911 is a good generic fix, this implementation feels like the architectural step needed to make HPX's split truly P2300 compliant.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 11, 2026

@isidorostsa could you have a look, please?

Copilot AI review requested due to automatic review settings April 29, 2026 19:24
@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

@isidorostsa ping

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a P2300 correctness issue in execution::split where late subscribers (connecting after the predecessor has completed) could receive completion inline instead of via the sender’s completion scheduler.

Changes:

  • Introduces shared_state::schedule_completion and routes late-subscriber completions through it.
  • Adds shared_state_scheduler<Sched> to dispatch late completions via schedule(sched) for generic schedulers.
  • Adds a new unit test target algorithm_split_scheduler.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
libs/core/execution/include/hpx/execution/algorithms/split.hpp Adds scheduler-aware completion routing for late subscribers and new generic-scheduler split support.
libs/core/execution/tests/unit/algorithm_split_scheduler.cpp Adds regression coverage intended to exercise the late-subscriber scheduler-preservation behavior.
libs/core/execution/tests/unit/CMakeLists.txt Registers the new unit test.

Comment thread libs/core/execution/include/hpx/execution/algorithms/split.hpp Outdated
Comment thread libs/core/execution/tests/unit/algorithm_split_scheduler.cpp
Comment thread libs/core/execution/tests/unit/algorithm_split_scheduler.cpp Outdated
Copilot AI review requested due to automatic review settings April 30, 2026 01:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +551 to +555
, op_state(hpx::execution::experimental::connect(
hpx::execution::experimental::schedule(s),
schedule_receiver{
hpx::intrusive_ptr<schedule_op_holder>(this)}))
{
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedule_op_holder constructs schedule_receiver with intrusive_ptr<schedule_op_holder>(this) while ref_count starts at 0. If schedule(s) or connect(...) throws during op_state construction, the temporary intrusive_ptr can decrement the count back to 0 and call intrusive_ptr_release on a partially-constructed object, which is undefined behaviour. Consider establishing an owning ref before creating the receiver (e.g., set ref_count to 1 and pass intrusive_ptr(this, /add_ref=/false), then create the external owner with add_ref=false), or restructure so op_state is created after the owner intrusive_ptr exists.

Copilot uses AI. Check for mistakes.
Comment on lines +597 to +616
continuation_type&& continuation) override
{
using holder_alloc_type =
typename schedule_op_holder::holder_alloc_type;
using holder_alloc_traits =
std::allocator_traits<holder_alloc_type>;
using holder_unique_ptr =
std::unique_ptr<schedule_op_holder,
util::allocator_deleter<holder_alloc_type>>;

holder_alloc_type holder_alloc(this->alloc);
holder_unique_ptr p(
holder_alloc_traits::allocate(holder_alloc, 1),
hpx::util::allocator_deleter<holder_alloc_type>{
holder_alloc});
holder_alloc_traits::construct(holder_alloc, p.get(),
HPX_MOVE(continuation), sched, holder_alloc);

hpx::intrusive_ptr<schedule_op_holder> owner(p.release());
hpx::execution::experimental::start(owner->op_state);
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared_state_scheduler::schedule_completion is invoked from split_sender::operation_state::start (which is noexcept). However this implementation performs allocations and calls schedule/connect, any of which may throw; an exception escaping here will trigger std::terminate and can make this path much less robust than the previous inline completion. Consider making schedule_completion noexcept and handling failures explicitly (e.g., catch and either invoke the continuation inline as a fallback, or terminate with a clear message).

Suggested change
continuation_type&& continuation) override
{
using holder_alloc_type =
typename schedule_op_holder::holder_alloc_type;
using holder_alloc_traits =
std::allocator_traits<holder_alloc_type>;
using holder_unique_ptr =
std::unique_ptr<schedule_op_holder,
util::allocator_deleter<holder_alloc_type>>;
holder_alloc_type holder_alloc(this->alloc);
holder_unique_ptr p(
holder_alloc_traits::allocate(holder_alloc, 1),
hpx::util::allocator_deleter<holder_alloc_type>{
holder_alloc});
holder_alloc_traits::construct(holder_alloc, p.get(),
HPX_MOVE(continuation), sched, holder_alloc);
hpx::intrusive_ptr<schedule_op_holder> owner(p.release());
hpx::execution::experimental::start(owner->op_state);
continuation_type&& continuation) noexcept override
{
try
{
using holder_alloc_type =
typename schedule_op_holder::holder_alloc_type;
using holder_alloc_traits =
std::allocator_traits<holder_alloc_type>;
using holder_unique_ptr =
std::unique_ptr<schedule_op_holder,
util::allocator_deleter<holder_alloc_type>>;
holder_alloc_type holder_alloc(this->alloc);
holder_unique_ptr p(
holder_alloc_traits::allocate(holder_alloc, 1),
hpx::util::allocator_deleter<holder_alloc_type>{
holder_alloc});
holder_alloc_traits::construct(holder_alloc, p.get(),
HPX_MOVE(continuation), sched, holder_alloc);
hpx::intrusive_ptr<schedule_op_holder> owner(
p.release());
hpx::execution::experimental::start(owner->op_state);
}
catch (...)
{
try
{
HPX_MOVE(continuation)();
}
catch (...)
{
std::terminate();
}
}

Copilot uses AI. Check for mistakes.
@isidorostsa
Copy link
Copy Markdown
Contributor

Hey @arpittkhandelwal thanks for the PR!
As you may have seen, we are in the process of retiring our internal stdexec implementation (#7123).
Is there a reason to single out split and keep our own implementation active?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: senders/receivers Implementations of the p0443r14 / p2300 + p1897 proposals type: compatibility issue type: enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants