Skip to content

refactor(matching): coalesce zero-delay matching timers#181

Draft
gregorydemay wants to merge 5 commits into
mainfrom
dex_DEFI-2823_coalesce-zero-delay-matching-timers
Draft

refactor(matching): coalesce zero-delay matching timers#181
gregorydemay wants to merge 5 commits into
mainfrom
dex_DEFI-2823_coalesce-zero-delay-matching-timers

Conversation

@gregorydemay

@gregorydemay gregorydemay commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

A burst of N back-to-back add_limit_order calls previously queued N zero-delay matching timers plus the self-reschedule chain, each costing a self-call message that mostly did no useful work.

This tracks whether a matching timer is already pending (matching_timer_scheduled in State) and only schedules one when none is in flight, collapsing a burst into a single drive loop. The chunked self-reschedule path that drains backlogs across messages is unchanged.

Because order matching is synchronous and stays so, the ProcessPendingOrders TimerGuard was provably inert — its AlreadyRunning outcome is unreachable. It is removed along with the now-unused Task enum, active_tasks, and the AlreadyRunning variant. The per-(caller, token) UserOpGuard for deposits/withdrawals is untouched.

Acceptance

  • N back-to-back add_limit_order calls schedule O(1) matching timers.
  • The chunked self-reschedule drain path stays intact.
  • Unit coverage for the burst-coalescing behaviour.

Notes

Removed the integration test should_stop_matching_on_halted_pair_only. It placed crossable orders on a pair during a brief resume window and assumed they would not match before the next explicit tick — an assumption that only holds for a particular timer-firing schedule, since the matching timer fires between ingress messages on its own. Coalescing shifts that schedule, so an open pair's crossing orders can now match before a subsequent halt (the halt guarantee itself is intact). The real invariant — a per-pair halt skips only the halted book, which fills once resumed — is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others.

Track whether a matching timer is already scheduled and only arm one when
none is pending, collapsing a burst of add_limit_order kickoffs into a
single drive loop. The chunked self-reschedule drain path is unchanged.

Since order matching is synchronous, the ProcessPendingOrders TimerGuard
was provably inert (AlreadyRunning unreachable); drop it along with the
now-unused Task enum, active_tasks, and the AlreadyRunning variant.

DEFI-2823

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 25, 2026 13:21
Comment thread canister/src/lib.rs Outdated
@gregorydemay

Copy link
Copy Markdown
Contributor Author

🧐 Review — VERDICT: CHANGES_REQUESTED (CI not yet green). Severity tally: 0 🔴 / 0 🟠 / 1 🔵.

(Posting as a comment: GitHub blocks a formal request-changes review on one's own PR.)

The review substance is clean; the only blocker is CI. gh pr checks 181 shows lint, reproducible-build, unit-tests, and benchmark still pending. Still-pending CI is a hard block on a READY verdict, independent of the review. Re-check once CI is green.

Substance: The flag-based approach is the minimal, maintainable solution — I brainstormed the same matching_timer_scheduled boolean before reading the diff and found no better trade-off. Removing the TimerGuard/Task/active_tasks/AlreadyRunning machinery is well-justified: matching is synchronous, so AlreadyRunning was provably unreachable dead code, and the cleanup is complete (no dangling references). I ran cargo test -p oisy_trade_canister: all 371 pass. I established coverage by evidence — mutating try_mark_matching_timer_scheduled to always return true fails should_schedule_a_single_timer_until_it_fires (reverted immediately).

Acceptance criteria:

  1. O(1) timers per burst — satisfied via the flag; gating logic verified by mutation.
  2. Chunked self-reschedule intact — drive_matching still re-arms on MoreWork; clearing the flag before processing is correct under the single-threaded synchronous model (no interleaving kickoff can race the drive loop).
  3. Unit coverage — present and effective at the State level.

Maintainability:

  • Duplication: none found. Removal is net-negative LOC; no copy-pasted test/setup blocks introduced.
  • Unused derives: none found. No new types added (a bool field).
  • Primitive-obsession parameters: cleared. The bool flag has no corresponding domain newtype for a "timer scheduled" concept; try_mark/clear encapsulate it. bool is appropriate here.
  • Divergent invariant handling: none found. The flag is set/cleared at exactly the two paired sites (schedule_matching_timer / drive_matching); reset-on-upgrade handled consistently in snapshot from_state and into_state, matching the existing in_flight_user_ops treatment.
  • Silent fallbacks: none found. No unwrap_or_default/Result::ok/NaN paths introduced.

One 🔵 nit posted inline on drive_matching re: the clear-then-process-then-rearm ordering being untested (pre-existing structural gap — set_timer isn't unit-testable; worth a follow-up PocketIC integration test). Non-blocking.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

canbench 🏋 (dir: canister) 2466fd6 2026-06-26 08:21:02 UTC

canister/canbench_results.yml is up to date
📦 canbench_results_benchmark.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 14 | regressed 0 | improved 0 | new 0 | unchanged 14]
    change:   [max +330.41K | p75 +191.99K | median +11.46K | p25 +125 | min -2]
    change %: [max +1.43% | p75 +0.02% | median +0.02% | p25 0.00% | min -0.02%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 14 | regressed 0 | improved 0 | new 0 | unchanged 14]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 14 | regressed 0 | improved 0 | new 0 | unchanged 14]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the canister’s matching trigger path to coalesce bursts of zero-delay matching timers into a single scheduled timer, and removes the previously used (but now deemed unreachable) “already running” matching guard/task machinery.

Changes:

  • Introduces matching_timer_scheduled in State with helpers to coalesce zero-delay timer scheduling.
  • Adds schedule_matching_timer() and updates entry points (add_limit_order, resume_trading, and MoreWork reschedule) to use it.
  • Removes TimerGuard, Task, active_tasks, and ExecutionStatus::AlreadyRunning, updating snapshot behavior/tests accordingly.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
canister/src/tests.rs Replaces the old guard-based test with a coalescing-focused unit test.
canister/src/state/snapshot/tests.rs Updates snapshot roundtrip test to reflect new transient runtime state.
canister/src/state/snapshot/mod.rs Excludes matching_timer_scheduled from snapshots and resets it on restore.
canister/src/state/mod.rs Replaces active_tasks with matching_timer_scheduled and adds helper methods.
canister/src/main.rs Switches immediate matching kickoffs to schedule_matching_timer().
canister/src/lib.rs Adds schedule_matching_timer() and updates drive_matching() reschedule behavior.
canister/src/guard/mod.rs Removes the now-unused TimerGuard.
canister/src/execute/mod.rs Removes the unreachable AlreadyRunning execution status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread canister/src/lib.rs
gregorydemay and others added 3 commits June 25, 2026 16:20
`should_stop_matching_on_halted_pair_only` placed crossable orders on a
pair during a brief resume window and asserted they would not match before
the next explicit tick. That holds only for a particular timer-firing
schedule: the matching timer fires between ingress messages on its own, so
an open pair's crossing orders can legitimately match before a subsequent
halt. Coalescing the kickoff timers shifts that schedule and the assertion
no longer holds — not because the halt guarantee broke, but because the
scenario is inherently racy at the integration level.

The real invariant — a per-pair halt skips only the halted book while
others keep matching, and the book fills once resumed — is covered
deterministically by the unit test
`execute::tests::should_skip_halted_book_while_matching_others`.

DEFI-2823

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-zero-delay-matching-timers

# Conflicts:
#	integration_tests/tests/tests.rs
Copilot AI review requested due to automatic review settings June 26, 2026 07:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread canister/src/lib.rs

@gregorydemay gregorydemay left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧐 Review passed — no blockers/mediums, CI green. Ready for human approval.

Scope reviewed: the DEFI-2823 coalescing change (the new matching_timer_scheduled flag + schedule_matching_timer/drive_matching rewiring), the removal of the provably-inert ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning, and the removal of the timing-fragile integration test. The merged-in main commits were not reviewed.

Verdict: READY. Severity tally: 0 🔴, 0 🟠, 0 🔵.

Correctness:

  • The set/clear of the flag is symmetric and race-free on the IC: schedule_matching_timer marks then set_timer with no intervening await (single message), and drive_matching clears the flag on entry before process_pending_orders, so Complete still clears it — no stranded flag, no permanent matching stall.
  • The periodic MATCHING_INTERVAL timer (re-armed in setup_timers on init and post_upgrade) calls drive_matching directly and is the safety net; resetting the flag to false in the snapshot on upgrade is correct because IC timers are cleared on upgrade and the interval timer re-arms.
  • Coverage established by evidence, not inspection: mutating try_mark_matching_timer_scheduled to always return true makes matching_timer_coalescing::should_schedule_a_single_timer_until_it_fires fail (reverted immediately) — the new unit test genuinely pins the coalescing behaviour.

Test removal justified: should_stop_matching_on_halted_pair_only asserted a property that only held for a specific timer-firing schedule, which coalescing shifts; the real per-pair-halt invariant (halted book skipped, fills on resume) is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others (verified: book A stays Pending while book B fills, then fills on resume).

Tests: unit cargo test -p oisy_trade_canister 391 passed; integration --test-threads 2 46 passed.

Maintainability:

  • Duplication: none found. The new unit test is a single focused #[test]; no near-duplicate test bodies or helper families introduced.
  • Unused derives: none. Task and its Ord/PartialOrd/Hash-style derives are removed wholesale; bool carries no derives.
  • Primitive-obsession parameters: cleared. matching_timer_scheduled: bool is a single-bit latch (genuinely boolean: pending or not), not a quantity that aliases an existing newtype like OrderId/OrderSeq — a newtype would add nothing.
  • Divergent invariant handling: cleared. Removing the always-Some TimerGuard collapses the matching path to a single way of scheduling; UserOpGuard (deposits/withdrawals) is untouched and unaffected.
  • Silent fallbacks: none. No new unwrap_or_default/Result::ok/NaN/let _ = on a failure path; the flag is an optimization with the periodic timer as the safety net.
  • Test-only code in production / redundant-or-derivable params / caller-owned decisions: none introduced.

Docs: the doc comments on schedule_matching_timer, drive_matching, the two State methods, and the updated snapshot test docstring match the implementation; no JIRA tickets or requirement-ID tags leaked into code; the stale DEFI-2823 TODOs were correctly removed.

Leaving as draft per process — not approving or merging.

Move `clear_matching_timer_scheduled` out of `drive_matching` and into the
zero-delay timer's own callback. The flag is now owned solely by the
scheduled-timer lifecycle, so `drive_matching` carries no flag bookkeeping
and the periodic interval timer (which calls `drive_matching` directly)
can no longer clear the flag while a kickoff timer is still pending — which
previously could let a later kickoff schedule a second timer.

Addresses Copilot review feedback on the periodic-timer interaction.

DEFI-2823

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@gregorydemay gregorydemay left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧐 Review passed — no blockers/mediums, CI green. Ready for human approval.

Re-review after 0c0e060 addressed the Copilot comment (moving clear_matching_timer_scheduled() out of drive_matching and into the scheduled-timer callback). I focused on the new clear-in-callback placement and the coalescing invariant.

Focus area — clear-in-callback / coalescing invariant (no blocker):

  • The flag is set only in try_mark_matching_timer_scheduled, immediately followed by set_timer(ZERO, ..) (which always schedules — no Result), so flag set ⟹ timer armed. The callback clears the flag as its first synchronous action (the whole drive chain has no .await), so the flag cannot be stranded — no stall path.
  • drive_matching no longer touches the flag, so the periodic interval timer (which calls drive_matching directly, lifecycle/mod.rs:112) can no longer clear a still-pending kickoff timer. That is the exact deviation Copilot flagged; the fix closes it. "At most one pending timer" holds across all three paths (kickoff, the MoreWork self-reschedule inside the callback, periodic interval).

Acceptance criteria: (a) burst → O(1) timers: met via the flag, covered by matching_timer_coalescing::should_schedule_a_single_timer_until_it_fires; (b) chunked self-reschedule drain intact: drive_matching still re-arms on MoreWork; (c) unit coverage present. Coverage confirmed by EVIDENCE — mutating try_mark_matching_timer_scheduled to always return true fails the test (reverted immediately).

Removed integration test: should_stop_matching_on_halted_pair_only was timing-fragile (assumed a specific timer-firing schedule that coalescing changes). The real invariant — per-pair halt skips only the halted book and fills once resumed — is covered deterministically by execute::tests::should_skip_halted_book_while_matching_others (book A stays Pending, book B fills, then A fills on resume). Justified removal.

Dead-code removal: ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning removed; no orphan references remain (grep clean), BTreeSet import still used by in_flight_user_ops. The per-(caller,token) UserOpGuard is untouched.

Maintainability:

  • Duplication: none found (the new unit test is a single sequential state-machine check, not a copy-pasted data axis; no test-group duplication).
  • Unused derives: none — the removed Task enum eliminated Ord/PartialOrd that only existed to key the BTreeSet; bool flag needs none.
  • Primitive-obsession parameters: none — matching_timer_scheduled: bool is a genuine boolean lifecycle flag, not a quantity that wants a newtype.
  • Divergent invariant handling: none — the flag is set/cleared at exactly the two intended sites.
  • Silent fallbacks: none — no unwrap_or_default/Result::ok/NaN paths introduced.

CI: all green on 0c0e060. Unit (391 passed) and integration (46 passed, incl. upgrade-replay) green locally. All three prior threads resolved.

@gregorydemay

Copy link
Copy Markdown
Contributor Author

🤖 This PR is ready for your review.

The automated reviewer returned VERDICT: READY (0 blockers / 0 mediums / 0 nits), CI is green on 0c0e060, and all review threads are resolved. Left as a draft — marking it ready-for-review, final approval, and merge are yours.

Summary of the build loop:

  • Coalesced the zero-delay matching kickoffs behind a single matching_timer_scheduled flag (O(1) timers per burst; the chunked self-reschedule drain path is unchanged).
  • Removed the now-inert ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning (matching is synchronous, so the guard was unreachable).
  • Dropped the timing-fragile integration test should_stop_matching_on_halted_pair_only; the per-pair-halt invariant is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others.
  • Addressed Copilot's periodic-timer feedback by moving the flag-clear into the scheduled timer's callback, keeping drive_matching free of flag bookkeeping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants