Skip to content

htlcswitch/invoices: redesign hodl subscription channel ownership to eliminate structural deadlock risk #10720

@ziggie1984

Description

@ziggie1984

Background

PR #10719 fixes a deadlock in channelLink.Stop() by reordering the
teardown sequence so that htlcManager exits before hodl subscription
state is cleaned up. That fix closes the specific race window and is
safe to ship immediately.

However, the underlying design has a structural fragility that the
ordering fix only papers over. This issue tracks the deeper redesign
needed to make hodl subscriptions robust by construction.

The Structural Problem

The current design couples the lifetime of a ConcurrentQueue
(owned by channelLink) with the routing of registry callbacks
(owned by InvoiceRegistry). Specifically:

  • channelLink creates a hodlQueue *queue.ConcurrentQueue and passes
    hodlQueue.ChanIn() directly into InvoiceRegistry.NotifyExitHopHtlc.
  • The registry stores that raw chan<- interface{} in
    hodlSubscriptions and later writes to it in notifyHodlSubscribers.
  • There is no mechanism in the registry to know whether the channel is
    still live at the time of the write.

This means the registry's correctness depends on an external invariant
(the link's queue must outlive any subscription registered against it)
that is enforced only by teardown ordering — a fragile guarantee that
can be violated by any future refactor of Stop(), by new callers of
NotifyExitHopHtlc, or by code paths that bypass the ordering fix.

Proposed Redesign Directions

Option A — Attach a quit channel to each subscription

Pass a <-chan struct{} (e.g. cg.Done()) alongside ChanIn() when
subscribing. notifyHodlSubscribers selects on both:

select {
case sub <- resolution:
case <-quit:
    // link is gone; drop and clean up
}

This makes every send non-blocking with respect to link lifetime and
lets the registry self-heal without relying on unsubscription ordering.
It is the fix direction suggested in issue #8803.

Option B — Registry-owned delivery, link-owned callback

Instead of exposing a raw channel, channelLink registers a callback
function
with the registry. The registry calls the callback directly;
the link's callback can check cg.Done() internally and discard the
resolution if the link is shutting down. This removes the shared channel
entirely and makes the boundary explicit.

Option C — Move subscription lifecycle into channelLink

Give channelLink full ownership: it subscribes on start, it
unsubscribes on stop, and it owns the queue. htlcManager subscribes
once at startup via a dedicated method that returns a typed subscription
handle (rather than a raw channel). Teardown calls handle.Cancel()
which atomically marks the subscription dead in the registry before
draining.

Why This Matters for SQL Migration

The deadlock risk is amplified by KV channeldb (bbolt). The global
write lock on bbolt increases processRemoteRevokeAndAck latency,
which widens the race window between hodlQueue.Stop() and
cg.WgWait() that the ordering fix closes. Once channeldb is fully
migrated to native SQL (tracked separately), the amplification factor
drops — but the structural fragility remains and should be addressed
regardless of the storage backend.

Acceptance Criteria

  • notifyHodlSubscribers cannot block indefinitely due to a dead
    subscriber channel, regardless of teardown ordering.
  • The hodl subscription lifecycle (subscribe / notify / unsubscribe)
    is encapsulated behind a typed API that makes misuse a compile-time
    or early-runtime error rather than a deadlock.
  • A regression test exercises concurrent channelLink.Stop() and
    notifyHodlSubscribers to prevent re-introduction of the race.
  • The fix is compatible with both KV and SQL channeldb backends.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovements to existing features / behaviour

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions