refactor(matching): coalesce zero-delay matching timers#181
refactor(matching): coalesce zero-delay matching timers#181gregorydemay wants to merge 5 commits into
Conversation
Track whether a matching timer is already scheduled and only arm one when none is pending, collapsing a burst of add_limit_order kickoffs into a single drive loop. The chunked self-reschedule drain path is unchanged. Since order matching is synchronous, the ProcessPendingOrders TimerGuard was provably inert (AlreadyRunning unreachable); drop it along with the now-unused Task enum, active_tasks, and the AlreadyRunning variant. DEFI-2823 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🧐 Review — VERDICT: CHANGES_REQUESTED (CI not yet green). Severity tally: 0 🔴 / 0 🟠 / 1 🔵. (Posting as a comment: GitHub blocks a formal request-changes review on one's own PR.) The review substance is clean; the only blocker is CI. Substance: The flag-based approach is the minimal, maintainable solution — I brainstormed the same Acceptance criteria:
Maintainability:
One 🔵 nit posted inline on |
|
There was a problem hiding this comment.
Pull request overview
Refactors the canister’s matching trigger path to coalesce bursts of zero-delay matching timers into a single scheduled timer, and removes the previously used (but now deemed unreachable) “already running” matching guard/task machinery.
Changes:
- Introduces
matching_timer_scheduledinStatewith helpers to coalesce zero-delay timer scheduling. - Adds
schedule_matching_timer()and updates entry points (add_limit_order,resume_trading, andMoreWorkreschedule) to use it. - Removes
TimerGuard,Task,active_tasks, andExecutionStatus::AlreadyRunning, updating snapshot behavior/tests accordingly.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| canister/src/tests.rs | Replaces the old guard-based test with a coalescing-focused unit test. |
| canister/src/state/snapshot/tests.rs | Updates snapshot roundtrip test to reflect new transient runtime state. |
| canister/src/state/snapshot/mod.rs | Excludes matching_timer_scheduled from snapshots and resets it on restore. |
| canister/src/state/mod.rs | Replaces active_tasks with matching_timer_scheduled and adds helper methods. |
| canister/src/main.rs | Switches immediate matching kickoffs to schedule_matching_timer(). |
| canister/src/lib.rs | Adds schedule_matching_timer() and updates drive_matching() reschedule behavior. |
| canister/src/guard/mod.rs | Removes the now-unused TimerGuard. |
| canister/src/execute/mod.rs | Removes the unreachable AlreadyRunning execution status. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`should_stop_matching_on_halted_pair_only` placed crossable orders on a pair during a brief resume window and asserted they would not match before the next explicit tick. That holds only for a particular timer-firing schedule: the matching timer fires between ingress messages on its own, so an open pair's crossing orders can legitimately match before a subsequent halt. Coalescing the kickoff timers shifts that schedule and the assertion no longer holds — not because the halt guarantee broke, but because the scenario is inherently racy at the integration level. The real invariant — a per-pair halt skips only the halted book while others keep matching, and the book fills once resumed — is covered deterministically by the unit test `execute::tests::should_skip_halted_book_while_matching_others`. DEFI-2823 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-zero-delay-matching-timers # Conflicts: # integration_tests/tests/tests.rs
gregorydemay
left a comment
There was a problem hiding this comment.
🧐 Review passed — no blockers/mediums, CI green. Ready for human approval.
Scope reviewed: the DEFI-2823 coalescing change (the new matching_timer_scheduled flag + schedule_matching_timer/drive_matching rewiring), the removal of the provably-inert ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning, and the removal of the timing-fragile integration test. The merged-in main commits were not reviewed.
Verdict: READY. Severity tally: 0 🔴, 0 🟠, 0 🔵.
Correctness:
- The set/clear of the flag is symmetric and race-free on the IC:
schedule_matching_timermarks thenset_timerwith no intervening await (single message), anddrive_matchingclears the flag on entry beforeprocess_pending_orders, soCompletestill clears it — no stranded flag, no permanent matching stall. - The periodic
MATCHING_INTERVALtimer (re-armed insetup_timerson init and post_upgrade) callsdrive_matchingdirectly and is the safety net; resetting the flag tofalsein the snapshot on upgrade is correct because IC timers are cleared on upgrade and the interval timer re-arms. - Coverage established by evidence, not inspection: mutating
try_mark_matching_timer_scheduledto always returntruemakesmatching_timer_coalescing::should_schedule_a_single_timer_until_it_firesfail (reverted immediately) — the new unit test genuinely pins the coalescing behaviour.
Test removal justified: should_stop_matching_on_halted_pair_only asserted a property that only held for a specific timer-firing schedule, which coalescing shifts; the real per-pair-halt invariant (halted book skipped, fills on resume) is covered deterministically by the unit test execute::tests::should_skip_halted_book_while_matching_others (verified: book A stays Pending while book B fills, then fills on resume).
Tests: unit cargo test -p oisy_trade_canister 391 passed; integration --test-threads 2 46 passed.
Maintainability:
- Duplication: none found. The new unit test is a single focused
#[test]; no near-duplicate test bodies or helper families introduced. - Unused derives: none.
Taskand itsOrd/PartialOrd/Hash-style derives are removed wholesale;boolcarries no derives. - Primitive-obsession parameters: cleared.
matching_timer_scheduled: boolis a single-bit latch (genuinely boolean: pending or not), not a quantity that aliases an existing newtype likeOrderId/OrderSeq— a newtype would add nothing. - Divergent invariant handling: cleared. Removing the always-
SomeTimerGuardcollapses the matching path to a single way of scheduling;UserOpGuard(deposits/withdrawals) is untouched and unaffected. - Silent fallbacks: none. No new
unwrap_or_default/Result::ok/NaN/let _ =on a failure path; the flag is an optimization with the periodic timer as the safety net. - Test-only code in production / redundant-or-derivable params / caller-owned decisions: none introduced.
Docs: the doc comments on schedule_matching_timer, drive_matching, the two State methods, and the updated snapshot test docstring match the implementation; no JIRA tickets or requirement-ID tags leaked into code; the stale DEFI-2823 TODOs were correctly removed.
Leaving as draft per process — not approving or merging.
Move `clear_matching_timer_scheduled` out of `drive_matching` and into the zero-delay timer's own callback. The flag is now owned solely by the scheduled-timer lifecycle, so `drive_matching` carries no flag bookkeeping and the periodic interval timer (which calls `drive_matching` directly) can no longer clear the flag while a kickoff timer is still pending — which previously could let a later kickoff schedule a second timer. Addresses Copilot review feedback on the periodic-timer interaction. DEFI-2823 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
gregorydemay
left a comment
There was a problem hiding this comment.
🧐 Review passed — no blockers/mediums, CI green. Ready for human approval.
Re-review after 0c0e060 addressed the Copilot comment (moving clear_matching_timer_scheduled() out of drive_matching and into the scheduled-timer callback). I focused on the new clear-in-callback placement and the coalescing invariant.
Focus area — clear-in-callback / coalescing invariant (no blocker):
- The flag is set only in
try_mark_matching_timer_scheduled, immediately followed byset_timer(ZERO, ..)(which always schedules — noResult), soflag set ⟹ timer armed. The callback clears the flag as its first synchronous action (the whole drive chain has no.await), so the flag cannot be stranded — no stall path. drive_matchingno longer touches the flag, so the periodic interval timer (which callsdrive_matchingdirectly, lifecycle/mod.rs:112) can no longer clear a still-pending kickoff timer. That is the exact deviation Copilot flagged; the fix closes it. "At most one pending timer" holds across all three paths (kickoff, theMoreWorkself-reschedule inside the callback, periodic interval).
Acceptance criteria: (a) burst → O(1) timers: met via the flag, covered by matching_timer_coalescing::should_schedule_a_single_timer_until_it_fires; (b) chunked self-reschedule drain intact: drive_matching still re-arms on MoreWork; (c) unit coverage present. Coverage confirmed by EVIDENCE — mutating try_mark_matching_timer_scheduled to always return true fails the test (reverted immediately).
Removed integration test: should_stop_matching_on_halted_pair_only was timing-fragile (assumed a specific timer-firing schedule that coalescing changes). The real invariant — per-pair halt skips only the halted book and fills once resumed — is covered deterministically by execute::tests::should_skip_halted_book_while_matching_others (book A stays Pending, book B fills, then A fills on resume). Justified removal.
Dead-code removal: ProcessPendingOrders TimerGuard/Task/active_tasks/AlreadyRunning removed; no orphan references remain (grep clean), BTreeSet import still used by in_flight_user_ops. The per-(caller,token) UserOpGuard is untouched.
Maintainability:
- Duplication: none found (the new unit test is a single sequential state-machine check, not a copy-pasted data axis; no test-group duplication).
- Unused derives: none — the removed
Taskenum eliminatedOrd/PartialOrdthat only existed to key theBTreeSet;boolflag needs none. - Primitive-obsession parameters: none —
matching_timer_scheduled: boolis a genuine boolean lifecycle flag, not a quantity that wants a newtype. - Divergent invariant handling: none — the flag is set/cleared at exactly the two intended sites.
- Silent fallbacks: none — no
unwrap_or_default/Result::ok/NaNpaths introduced.
CI: all green on 0c0e060. Unit (391 passed) and integration (46 passed, incl. upgrade-replay) green locally. All three prior threads resolved.
|
🤖 This PR is ready for your review. The automated reviewer returned VERDICT: READY (0 blockers / 0 mediums / 0 nits), CI is green on Summary of the build loop:
|
Summary
A burst of N back-to-back
add_limit_ordercalls previously queued N zero-delay matching timers plus the self-reschedule chain, each costing a self-call message that mostly did no useful work.This tracks whether a matching timer is already pending (
matching_timer_scheduledinState) and only schedules one when none is in flight, collapsing a burst into a single drive loop. The chunked self-reschedule path that drains backlogs across messages is unchanged.Because order matching is synchronous and stays so, the
ProcessPendingOrdersTimerGuardwas provably inert — itsAlreadyRunningoutcome is unreachable. It is removed along with the now-unusedTaskenum,active_tasks, and theAlreadyRunningvariant. The per-(caller, token)UserOpGuardfor deposits/withdrawals is untouched.Acceptance
add_limit_ordercalls schedule O(1) matching timers.Notes
Removed the integration test
should_stop_matching_on_halted_pair_only. It placed crossable orders on a pair during a brief resume window and assumed they would not match before the next explicit tick — an assumption that only holds for a particular timer-firing schedule, since the matching timer fires between ingress messages on its own. Coalescing shifts that schedule, so an open pair's crossing orders can now match before a subsequent halt (the halt guarantee itself is intact). The real invariant — a per-pair halt skips only the halted book, which fills once resumed — is covered deterministically by the unit testexecute::tests::should_skip_halted_book_while_matching_others.