refactor(matching): coalesce zero-delay matching timers by gregorydemay · Pull Request #181 · dfinity/oisy-trade

gregorydemay · 2026-06-25T13:21:07Z

Summary

A burst of N back-to-back add_limit_order calls previously queued N zero-delay matching timers plus the self-reschedule chain, each costing a self-call message that mostly did no useful work.

This tracks whether a matching timer is already pending (matching_timer_scheduled in State) and only schedules one when none is in flight, collapsing a burst into a single drive loop. The chunked self-reschedule path that drains backlogs across messages is unchanged.

Because order matching is synchronous and stays so, the ProcessPendingOrders TimerGuard was provably inert — its AlreadyRunning outcome is unreachable. It is removed along with the now-unused Task enum, active_tasks, and the AlreadyRunning variant. The per-(caller, token) UserOpGuard for deposits/withdrawals is untouched.

Acceptance

N back-to-back add_limit_order calls schedule O(1) matching timers.
The chunked self-reschedule drain path stays intact.
Unit coverage for the burst-coalescing behaviour.

Notes

Removed the integration test should_stop_matching_on_halted_pair_only. It placed crossable orders on a pair during a brief resume window and assumed they would not match before the next explicit tick — an assumption that only holds for a particular timer-firing schedule, since the matching timer fires between ingress messages on its own. Coalescing shifts that schedule, so an open pair's crossing orders can now match before a subsequent halt (the halt guarantee itself is intact). The real invariant — a per-pair halt skips only the halted book, which fills once resumed — is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others.

Track whether a matching timer is already scheduled and only arm one when none is pending, collapsing a burst of add_limit_order kickoffs into a single drive loop. The chunked self-reschedule drain path is unchanged. Since order matching is synchronous, the ProcessPendingOrders TimerGuard was provably inert (AlreadyRunning unreachable); drop it along with the now-unused Task enum, active_tasks, and the AlreadyRunning variant. DEFI-2823 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gregorydemay · 2026-06-25T13:25:16Z

🧐 Review — VERDICT: CHANGES_REQUESTED (CI not yet green). Severity tally: 0 🔴 / 0 🟠 / 1 🔵.

(Posting as a comment: GitHub blocks a formal request-changes review on one's own PR.)

The review substance is clean; the only blocker is CI. gh pr checks 181 shows lint, reproducible-build, unit-tests, and benchmark still pending. Still-pending CI is a hard block on a READY verdict, independent of the review. Re-check once CI is green.

Substance: The flag-based approach is the minimal, maintainable solution — I brainstormed the same matching_timer_scheduled boolean before reading the diff and found no better trade-off. Removing the TimerGuard/Task/active_tasks/AlreadyRunning machinery is well-justified: matching is synchronous, so AlreadyRunning was provably unreachable dead code, and the cleanup is complete (no dangling references). I ran cargo test -p oisy_trade_canister: all 371 pass. I established coverage by evidence — mutating try_mark_matching_timer_scheduled to always return true fails should_schedule_a_single_timer_until_it_fires (reverted immediately).

Acceptance criteria:

O(1) timers per burst — satisfied via the flag; gating logic verified by mutation.
Chunked self-reschedule intact — drive_matching still re-arms on MoreWork; clearing the flag before processing is correct under the single-threaded synchronous model (no interleaving kickoff can race the drive loop).
Unit coverage — present and effective at the State level.

Maintainability:

Duplication: none found. Removal is net-negative LOC; no copy-pasted test/setup blocks introduced.
Unused derives: none found. No new types added (a bool field).
Primitive-obsession parameters: cleared. The bool flag has no corresponding domain newtype for a "timer scheduled" concept; try_mark/clear encapsulate it. bool is appropriate here.
Divergent invariant handling: none found. The flag is set/cleared at exactly the two paired sites (schedule_matching_timer / drive_matching); reset-on-upgrade handled consistently in snapshot from_state and into_state, matching the existing in_flight_user_ops treatment.
Silent fallbacks: none found. No unwrap_or_default/Result::ok/NaN paths introduced.

One 🔵 nit posted inline on drive_matching re: the clear-then-process-then-rearm ordering being untested (pre-existing structural gap — set_timer isn't unit-testable; worth a follow-up PocketIC integration test). Non-blocking.

github-actions · 2026-06-25T13:26:25Z

`canbench` 🏋 (dir: canister) `2466fd6` 2026-06-26 08:21:02 UTC

✅ canister/canbench_results.yml is up to date
📦 canbench_results_benchmark.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 14 | regressed 0 | improved 0 | new 0 | unchanged 14]
    change:   [max +330.41K | p75 +191.99K | median +11.46K | p25 +125 | min -2]
    change %: [max +1.43% | p75 +0.02% | median +0.02% | p25 0.00% | min -0.02%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 14 | regressed 0 | improved 0 | new 0 | unchanged 14]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 14 | regressed 0 | improved 0 | new 0 | unchanged 14]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

Copilot

Pull request overview

Refactors the canister’s matching trigger path to coalesce bursts of zero-delay matching timers into a single scheduled timer, and removes the previously used (but now deemed unreachable) “already running” matching guard/task machinery.

Changes:

Introduces matching_timer_scheduled in State with helpers to coalesce zero-delay timer scheduling.
Adds schedule_matching_timer() and updates entry points (add_limit_order, resume_trading, and MoreWork reschedule) to use it.
Removes TimerGuard, Task, active_tasks, and ExecutionStatus::AlreadyRunning, updating snapshot behavior/tests accordingly.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
canister/src/tests.rs	Replaces the old guard-based test with a coalescing-focused unit test.
canister/src/state/snapshot/tests.rs	Updates snapshot roundtrip test to reflect new transient runtime state.
canister/src/state/snapshot/mod.rs	Excludes `matching_timer_scheduled` from snapshots and resets it on restore.
canister/src/state/mod.rs	Replaces `active_tasks` with `matching_timer_scheduled` and adds helper methods.
canister/src/main.rs	Switches immediate matching kickoffs to `schedule_matching_timer()`.
canister/src/lib.rs	Adds `schedule_matching_timer()` and updates `drive_matching()` reschedule behavior.
canister/src/guard/mod.rs	Removes the now-unused `TimerGuard`.
canister/src/execute/mod.rs	Removes the unreachable `AlreadyRunning` execution status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

`should_stop_matching_on_halted_pair_only` placed crossable orders on a pair during a brief resume window and asserted they would not match before the next explicit tick. That holds only for a particular timer-firing schedule: the matching timer fires between ingress messages on its own, so an open pair's crossing orders can legitimately match before a subsequent halt. Coalescing the kickoff timers shifts that schedule and the assertion no longer holds — not because the halt guarantee broke, but because the scenario is inherently racy at the integration level. The real invariant — a per-pair halt skips only the halted book while others keep matching, and the book fills once resumed — is covered deterministically by the unit test `execute::tests::should_skip_halted_book_while_matching_others`. DEFI-2823 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e-zero-delay-matching-timers # Conflicts: # integration_tests/tests/tests.rs

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

gregorydemay

🧐 Review passed — no blockers/mediums, CI green. Ready for human approval.

Scope reviewed: the DEFI-2823 coalescing change (the new matching_timer_scheduled flag + schedule_matching_timer/drive_matching rewiring), the removal of the provably-inert ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning, and the removal of the timing-fragile integration test. The merged-in main commits were not reviewed.

Verdict: READY. Severity tally: 0 🔴, 0 🟠, 0 🔵.

Correctness:

The set/clear of the flag is symmetric and race-free on the IC: schedule_matching_timer marks then set_timer with no intervening await (single message), and drive_matching clears the flag on entry before process_pending_orders, so Complete still clears it — no stranded flag, no permanent matching stall.
The periodic MATCHING_INTERVAL timer (re-armed in setup_timers on init and post_upgrade) calls drive_matching directly and is the safety net; resetting the flag to false in the snapshot on upgrade is correct because IC timers are cleared on upgrade and the interval timer re-arms.
Coverage established by evidence, not inspection: mutating try_mark_matching_timer_scheduled to always return true makes matching_timer_coalescing::should_schedule_a_single_timer_until_it_fires fail (reverted immediately) — the new unit test genuinely pins the coalescing behaviour.

Test removal justified: should_stop_matching_on_halted_pair_only asserted a property that only held for a specific timer-firing schedule, which coalescing shifts; the real per-pair-halt invariant (halted book skipped, fills on resume) is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others (verified: book A stays Pending while book B fills, then fills on resume).

Tests: unit cargo test -p oisy_trade_canister 391 passed; integration --test-threads 2 46 passed.

Maintainability:

Duplication: none found. The new unit test is a single focused #[test]; no near-duplicate test bodies or helper families introduced.
Unused derives: none. Task and its Ord/PartialOrd/Hash-style derives are removed wholesale; bool carries no derives.
Primitive-obsession parameters: cleared. matching_timer_scheduled: bool is a single-bit latch (genuinely boolean: pending or not), not a quantity that aliases an existing newtype like OrderId/OrderSeq — a newtype would add nothing.
Divergent invariant handling: cleared. Removing the always-Some TimerGuard collapses the matching path to a single way of scheduling; UserOpGuard (deposits/withdrawals) is untouched and unaffected.
Silent fallbacks: none. No new unwrap_or_default/Result::ok/NaN/let _ = on a failure path; the flag is an optimization with the periodic timer as the safety net.
Test-only code in production / redundant-or-derivable params / caller-owned decisions: none introduced.

Docs: the doc comments on schedule_matching_timer, drive_matching, the two State methods, and the updated snapshot test docstring match the implementation; no JIRA tickets or requirement-ID tags leaked into code; the stale DEFI-2823 TODOs were correctly removed.

Leaving as draft per process — not approving or merging.

Move `clear_matching_timer_scheduled` out of `drive_matching` and into the zero-delay timer's own callback. The flag is now owned solely by the scheduled-timer lifecycle, so `drive_matching` carries no flag bookkeeping and the periodic interval timer (which calls `drive_matching` directly) can no longer clear the flag while a kickoff timer is still pending — which previously could let a later kickoff schedule a second timer. Addresses Copilot review feedback on the periodic-timer interaction. DEFI-2823 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gregorydemay

🧐 Review passed — no blockers/mediums, CI green. Ready for human approval.

Re-review after 0c0e060 addressed the Copilot comment (moving clear_matching_timer_scheduled() out of drive_matching and into the scheduled-timer callback). I focused on the new clear-in-callback placement and the coalescing invariant.

Focus area — clear-in-callback / coalescing invariant (no blocker):

The flag is set only in try_mark_matching_timer_scheduled, immediately followed by set_timer(ZERO, ..) (which always schedules — no Result), so flag set ⟹ timer armed. The callback clears the flag as its first synchronous action (the whole drive chain has no .await), so the flag cannot be stranded — no stall path.
drive_matching no longer touches the flag, so the periodic interval timer (which calls drive_matching directly, lifecycle/mod.rs:112) can no longer clear a still-pending kickoff timer. That is the exact deviation Copilot flagged; the fix closes it. "At most one pending timer" holds across all three paths (kickoff, the MoreWork self-reschedule inside the callback, periodic interval).

Acceptance criteria: (a) burst → O(1) timers: met via the flag, covered by matching_timer_coalescing::should_schedule_a_single_timer_until_it_fires; (b) chunked self-reschedule drain intact: drive_matching still re-arms on MoreWork; (c) unit coverage present. Coverage confirmed by EVIDENCE — mutating try_mark_matching_timer_scheduled to always return true fails the test (reverted immediately).

Removed integration test: should_stop_matching_on_halted_pair_only was timing-fragile (assumed a specific timer-firing schedule that coalescing changes). The real invariant — per-pair halt skips only the halted book and fills once resumed — is covered deterministically by execute::tests::should_skip_halted_book_while_matching_others (book A stays Pending, book B fills, then A fills on resume). Justified removal.

Dead-code removal: ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning removed; no orphan references remain (grep clean), BTreeSet import still used by in_flight_user_ops. The per-(caller,token) UserOpGuard is untouched.

Maintainability:

Duplication: none found (the new unit test is a single sequential state-machine check, not a copy-pasted data axis; no test-group duplication).
Unused derives: none — the removed Task enum eliminated Ord/PartialOrd that only existed to key the BTreeSet; bool flag needs none.
Primitive-obsession parameters: none — matching_timer_scheduled: bool is a genuine boolean lifecycle flag, not a quantity that wants a newtype.
Divergent invariant handling: none — the flag is set/cleared at exactly the two intended sites.
Silent fallbacks: none — no unwrap_or_default/Result::ok/NaN paths introduced.

CI: all green on 0c0e060. Unit (391 passed) and integration (46 passed, incl. upgrade-replay) green locally. All three prior threads resolved.

gregorydemay · 2026-06-26T08:30:03Z

🤖 This PR is ready for your review.

The automated reviewer returned VERDICT: READY (0 blockers / 0 mediums / 0 nits), CI is green on 0c0e060, and all review threads are resolved. Left as a draft — marking it ready-for-review, final approval, and merge are yours.

Summary of the build loop:

Coalesced the zero-delay matching kickoffs behind a single matching_timer_scheduled flag (O(1) timers per burst; the chunked self-reschedule drain path is unchanged).
Removed the now-inert ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning (matching is synchronous, so the guard was unreachable).
Dropped the timing-fragile integration test should_stop_matching_on_halted_pair_only; the per-pair-halt invariant is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others.
Addressed Copilot's periodic-timer feedback by moving the flag-clear into the scheduled timer's callback, keeping drive_matching free of flag bookkeeping.

Copilot AI review requested due to automatic review settings June 25, 2026 13:21

Copilot started reviewing on behalf of gregorydemay June 25, 2026 13:21 View session

gregorydemay commented Jun 25, 2026

View reviewed changes

Comment thread canister/src/lib.rs Outdated

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread canister/src/lib.rs

gregorydemay and others added 3 commits June 25, 2026 16:20

ci: re-trigger CI (workflow missed the prior synchronize event)

4ef293a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into dex_DEFI-2823_coalesc…

c0e4442

…e-zero-delay-matching-timers # Conflicts: # integration_tests/tests/tests.rs

Copilot AI review requested due to automatic review settings June 26, 2026 07:59

Copilot started reviewing on behalf of gregorydemay June 26, 2026 08:00 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread canister/src/lib.rs

gregorydemay commented Jun 26, 2026

View reviewed changes

gregorydemay mentioned this pull request Jun 26, 2026

feat(order): persist per-fill records in stable memory (3/5) #179

Merged

gregorydemay commented Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(matching): coalesce zero-delay matching timers#181

refactor(matching): coalesce zero-delay matching timers#181
gregorydemay wants to merge 5 commits into
mainfrom
dex_DEFI-2823_coalesce-zero-delay-matching-timers

gregorydemay commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

gregorydemay commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gregorydemay left a comment

Uh oh!

gregorydemay left a comment

Uh oh!

gregorydemay commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gregorydemay commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Acceptance

Notes

Uh oh!

Uh oh!

gregorydemay commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: canister) 2466fd6 2026-06-26 08:21:02 UTC

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

gregorydemay left a comment

Choose a reason for hiding this comment

Uh oh!

gregorydemay left a comment

Choose a reason for hiding this comment

Uh oh!

gregorydemay commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gregorydemay commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

`canbench` 🏋 (dir: canister) `2466fd6` 2026-06-26 08:21:02 UTC