fix(id-allocation): Schedule a flush when submitting ID Allocation op before replaying pending states #24683

markfields · 2025-05-21T23:11:34Z

Description

In ContainerRuntime, we submit to the Outbox directly from replayPendingStates, and so we should also call scheduleFlush to ensure that op doesn't get stuck in the Outbox. Being "stuck in the Outbox" is a problem because it's assumed that all ops in a batch come during the same JS turn and have the same reference sequence number, and this will violate that invariant.

That one-line change is accompanied by some refactoring so we don't have to consider Immediate mode at that callsite (scheduleFlush handles it now).

In-Depth

In the typical case, calling PendingStateManager.replayPendingStates will trigger another op(s) to submit which will schedule the flush. But here's a counterexample:

Create a new DataStore and use a compressed ID in it. Reconnect before attaching it or making any other changes.

This will result in the replay flow, and we will submit an ID Allocation op, but there's nothing else to resubmit. This op will be "stuck" in the Outbox.

In the current code, this is kind of ok, because ID Allocation ops don't make the container dirty, and it's ok to drop that op. BUT, as mentioned above, it can easily violate the invariant that all ops in a batch have the same reference sequence number, because that "old" op will be included in the next batch, which could be some time later after new ops have been processed (which advances the refSeq). See the unit test added here.

Copilot

Pull Request Overview

This PR ensures that after submitting an ID Allocation operation, a flush is scheduled to prevent ops from stalling in the Outbox and refactors the flush scheduling logic to use a unified flushPending flag and doPendingFlush helper.

Call scheduleFlush immediately after submitting an ID Allocation op before replaying pending states.
Replace flushTaskExists with flushPending and introduce a doPendingFlush method.
Update scheduleFlush to use a switch on flushMode, removing the old currentlyBatching helper and assertions.

packages/runtime/container-runtime/src/containerRuntime.ts

Prompt: > Write a test that generates a compressed ID, then reconnects (triggering replayPendingStates), then processes an op from another client, then generates another ID and submits a simple op using that ID. It should hit the "outboxSequenceNumberCoherencyCheck" error on flush. I had to do surprisingly little modification (besides extra refactoring I wanted to)

markfields added 2 commits May 21, 2025 23:07

Refactor scheduleFlush to also cover ImmediateMode case

474b463

Call scheduleFlush after submitting ID Allocation op in replay flow

8482eb2

Copilot AI review requested due to automatic review settings May 21, 2025 23:11

github-actions bot added area: runtime Runtime related issues base: main PRs targeted against main branch labels May 21, 2025

Copilot AI reviewed May 21, 2025

View reviewed changes

packages/runtime/container-runtime/src/containerRuntime.ts Show resolved Hide resolved

Add missing break

b52cd55

anthony-murphy reviewed May 22, 2025

View reviewed changes

packages/runtime/container-runtime/src/containerRuntime.ts Outdated Show resolved Hide resolved

anthony-murphy added the Feature_StagingMode label May 22, 2025

markfields added 2 commits May 23, 2025 06:51

Simplify

2903236

anthony-murphy approved these changes May 23, 2025

View reviewed changes

markfields merged commit b4e1fd1 into microsoft:main May 23, 2025
37 checks passed

markfields deleted the idc/replay-fix branch May 23, 2025 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(id-allocation): Schedule a flush when submitting ID Allocation op before replaying pending states #24683

fix(id-allocation): Schedule a flush when submitting ID Allocation op before replaying pending states #24683

markfields commented May 21, 2025 •

edited by azure-boards bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix(id-allocation): Schedule a flush when submitting ID Allocation op before replaying pending states #24683

fix(id-allocation): Schedule a flush when submitting ID Allocation op before replaying pending states #24683

Conversation

markfields commented May 21, 2025 • edited by azure-boards bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

In-Depth

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markfields commented May 21, 2025 •

edited by azure-boards bot

Loading