refactor: eager-write lot cache for credit_balances by odesenfans · Pull Request #1122 · aleph-im/pyaleph

odesenfans · 2026-05-12T21:36:19Z

Summary

Replace the credit_balances cache with a per-lot table maintained eagerly by writers, so reads collapse to a SUM and never walk credit_history or write back to the cache.

One row per granting credit_history entry, keyed by (address, credit_ref, credit_index), with amount_remaining, expiration_date, message_timestamp.
Distributions insert one lot per recipient; expenses drain the address's still-valid lots in emission order under FOR UPDATE; transfers drain the sender and insert recipient lots capped at min(source_expiration, requested_expiration).
get_credit_balance becomes SUM(amount_remaining) WHERE expiration_date IS NULL OR expiration_date > now().
repair_node rebuilds the lot cache from credit_history on startup, so the migration can drop the old cache without backfilling.

Emission-first FIFO and the min(source, requested) transfer cap are preserved verbatim. No consensus-level behaviour change. Switching to expiration-first drain is intentionally out of scope and will land in a follow-up PR.

Why not the bucket cache from #1121

That PR couples the cache architecture (good) with a drain-order change from emission-first to expiration-first (consensus-level). This PR splits the two so the operational fix can ship now and the policy change can take its own review cycle.

Scope comparison at the schema layer:

	This PR (lot cache)	#1121 (bucket cache)
Lines changed	~440 / 308	~1058 / 395
New toolkit modules	none	`infinity.py` (PG sentinel adapter)
Test delta	~30 lines	~647 lines (new corner cases)
Behavior change	none	drain order flips

Test plan

hatch run linting:all clean
hatch run testing:test tests/db/test_credit_balances.py -v clean
Full suite green
Manual: start a fresh node, confirm the repair hook rebuilds lots from existing credit_history
Manual: confirm reads no longer touch credit_balances for writes

The previous credit_balances table was a single integer sum per address, recomputed lazily inside the read path and written back from there. Every get_credit_balance call could trigger a full credit_history FIFO walk plus an UPDATE, which made the read path expensive and made transactions take write locks on cache rows they only meant to inspect. This change replaces the table with a per-lot cache: one row per granting credit_history entry, keyed by (address, credit_ref, credit_index), with an amount_remaining column. Writers (distribution, expense, transfer) maintain it eagerly: - distributions insert one lot per recipient; - expenses drain the address's still-valid lots in emission order ((message_timestamp, credit_ref, credit_index) ASC) under FOR UPDATE, decrementing amount_remaining in place; - transfers drain the sender the same way and insert recipient lots with each portion capped at min(source_expiration, requested_expiration). Reads collapse to a SUM over rows whose expiration_date is NULL or in the future, with no FIFO walk and no write-back. The repair_node startup hook rebuilds the lot cache from credit_history by replaying it in emission order, so the migration can drop the old cache without backfilling: the empty new table is populated on the next start. Emission-first FIFO and the min(source, requested) transfer cap are preserved verbatim. No consensus-level behaviour change. The drain-order change to expiration-first is intentionally out of scope; it will land in a follow-up PR.

foxpatch-aleph

This PR replaces the lazy-write credit_balances cache with an eager-write per-lot cache, making reads a simple SUM over still-valid lots instead of a FIFO walk. The logic is carefully preserved: emission-order FIFO, expiration filtering on read, and the min(source_expiration, requested_expiration) transfer cap are all maintained. The migration is correct (drop old table, create new one with FK to credit_history) and the repair hook ensures safe bootstrap. The test updates verify the key behavioral change (lots persist after expiration, read-time filtering handles it). No bugs, security issues, or semantic regressions were found. The code quality is high with clear documentation throughout.

CI surfaced two failure modes: 1. ForeignKeyViolation on the eager-write path. The lot insert ran inside the write functions before _bulk_insert_credit_history committed the referenced row, so the FK from credit_balances to credit_history exploded on every distribution and transfer. Drop the FK. The lot cache is a derived projection of credit_history, not authoritative state. Integrity is preserved by the eager writers (lot + history land in the same transaction) and by _repair_credit_balances on startup; a DB-level FK adds ordering coupling without runtime correctness benefit. 2. Tests that seed credit_history directly via session.add and then read through get_credit_balance / get_credit_balance_with_details saw 0 because the lot cache was never populated. Update the two test helpers (_insert_credit_history_entries in test_credit_balances.py, plus the two direct AlephCreditHistoryDb adds in test_process_stores.py) to call _rebuild_credit_lots_for_address on each touched address after seeding, mirroring what repair_node does on startup.

foxpatch-aleph

Well-structured refactor from lazy single-row cache to eager per-lot cache. The core FIFO logic and expiration semantics are preserved correctly. Main concern: get_credit_balances and count_credit_balances now exclude zero-balance addresses even with min_balance=0 (behaviour change from old code). Minor: missing type annotation, aggregate FILTER() syntax may be novel in this codebase.

src/aleph/db/accessors/balances.py (line 447): .having(balance_expr >= max(min_balance, 1)) — the old code returned all addresses when min_balance=0 (no filter). The new code excludes addresses with balance=0. If any caller relies on listing zero-balance addresses, this is a breaking change. Either preserve the old semantics (drop the max(..., 1) when min_balance=0) or document the change.

src/aleph/db/accessors/balances.py (line 471): Same max(min_balance, 1) issue as get_credit_balances — count_credit_balances with min_balance=0 now excludes zero-balance addresses, changing old behaviour.

src/aleph/db/accessors/balances.py (line 359): Missing type annotation on cutoff parameter. It accepts both dt.datetime and SQLAlchemy column expressions (via func.now()). A Union[dt.datetime, ColumnElement] or similar annotation would help readability.

src/aleph/db/accessors/balances.py (line 417): func.sum().filter(_valid_lot_filter(func.now())) uses SQLAlchemy's aggregate filter() clause. This generates SUM(...) FILTER (WHERE ...), which is PostgreSQL-specific and may not be supported by all SQLAlchemy versions. Consider using a subquery with a standard WHERE clause instead for better portability, or confirm this is tested against the target SQLAlchemy version.

deployment/migrations/versions/0057_a8c3d9f1b2e4_credit_balances_lot_cache.py (line 31): The migration drops and recreates credit_balances. Between migration completion and the repair hook running on next startup, reads against the new table will return 0 for all addresses. The PR description documents this (repair runs on startup), but it's worth ensuring the deploy order guarantees repair_node runs before any balance-dependent processing.

Drop max(min_balance, 1) in get_credit_balances / count_credit_balances so the default min_balance=0 keeps returning addresses with zero remaining credit (matching the pre-refactor behaviour). The public API still gates min_balance >= 1 at the OpenAPI schema layer. Annotate _valid_lot_filter's cutoff so callers can see it accepts both dt.datetime and a SQLAlchemy column expression.

odesenfans · 2026-05-12T22:21:09Z

Thanks for the review. Pushed d4b5c4a and notes on the others below.

1, 2 — min_balance=0 HAVING filter ✅ Fixed in d4b5c4a

Dropped max(min_balance, 1) in both get_credit_balances and count_credit_balances. With the default min_balance=0 they now use having balance_expr >= 0, which keeps addresses whose lots have all been drained to zero. That matches the pre-refactor behaviour where the lazy cache often held rows with balance=0 after a full drain.

External API access is still gated by the min_balance: 1 minimum on the OpenAPI schema, so this only affects internal Python callers using the default.

3 — Missing type annotation on cutoff ✅ Fixed in d4b5c4a

Annotated as Union[dt.datetime, ColumnElement[dt.datetime]].

4 — func.sum().filter() portability — Not changing

pyaleph is PostgreSQL-only across the board (pg_insert, JSONB, COPY-from-STDIN, partial indexes). SUM(...) FILTER (WHERE ...) is standard SQL since 2003 and SQLAlchemy 2.0.49 (what this repo runs) supports aggregate.filter() cleanly. A subquery rewrite would just add a wrapper without buying anything.

5 — Empty-cache window after migration — Acknowledged, no code change

True that between alembic upgrade head and the daemon's repair_node running, the cache is empty. But repair_node runs synchronously during startup before the HTTP listener accepts traffic, so on a single-instance deploy there is no read-serving window with an empty cache. Multi-instance rolling restarts on schema changes are out of scope for this PR (the migration is destructive anyway). The deploy order is the standard one: run migration, restart daemon, traffic resumes once startup completes.

foxpatch-aleph

This PR cleanly replaces the lazy-write credit_balances cache with an eager-write per-lot table. The FIFO consumption order, expiration capping on transfers, and whitelisted-sender bypass are all preserved verbatim. The code is well-documented, the migration is clean (drop-and-recreate with repair-from-history), locking via FOR UPDATE is correct, and ON CONFLICT DO NOTHING provides idempotent replay. The change set is focused and the PR description accurately scopes what's in vs. out (emission-order drain preserved, expiration-order drain deferred to a follow-up). No bugs found. Minor nits: the int(result or 0) redundancy in get_credit_balance, and adding an explicit test for _repair_credit_balances (the coordinator, not just _rebuild_credit_lots_for_address) would be nice-to-have.

src/aleph/db/accessors/balances.py (line 379): int(result or 0) is redundant — func.coalesce(func.sum(...), 0) already guarantees a non-None result. int(result) suffices.

src/aleph/repair.py (line 115): Consider adding a test for the _repair_credit_balances coordinator function (not just the per-address _rebuild_credit_lots_for_address). For example, verify that calling it rebuilds multiple addresses' lots correctly from history.

foxpatch-aleph previously approved these changes May 12, 2026

View reviewed changes

odesenfans dismissed foxpatch-aleph’s stale review via 8a62f94 May 12, 2026 21:56

foxpatch-aleph reviewed May 12, 2026

View reviewed changes

foxpatch-aleph approved these changes May 12, 2026

View reviewed changes

odesenfans merged commit 5fc98b3 into main May 12, 2026
4 checks passed

odesenfans deleted the od/credit-lot-cache branch May 12, 2026 22:27

odesenfans restored the od/credit-lot-cache branch May 12, 2026 22:27

odesenfans deleted the od/credit-lot-cache branch May 12, 2026 22:27

This was referenced May 12, 2026

feat: drain credit lots soonest-expiring first #1123

Open

fix: prevent backdated expense from draining future-dated grant #1124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: eager-write lot cache for credit_balances#1122

refactor: eager-write lot cache for credit_balances#1122
odesenfans merged 3 commits into
mainfrom
od/credit-lot-cache

odesenfans commented May 12, 2026

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

odesenfans commented May 12, 2026

Uh oh!

foxpatch-aleph left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

odesenfans commented May 12, 2026

Summary

Why not the bucket cache from #1121

Test plan

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

odesenfans commented May 12, 2026

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants