discovery: add panic recovery for gossip message processing #10470

Roasbeef · 2025-12-30T03:28:39Z

In this commit, we add a centralized panic recovery mechanism for gossip
goroutines. This increases the robustness of message processing in the
gossiper, as now we are able to keep on trucking in the face of logic
errors that may lead to panics.

We ensure that any deps are freed and we log the panic trace to help
catch bugs in the future.

IMO this is a defensive pattern we should adopt in other sub-systems that
implement the p2p facing functionality of the daemon. A lil defensive
programming can go a long way.

gemini-code-assist · 2025-12-30T03:28:51Z

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the fault tolerance of the gossip message processing within the discovery subsystem. By implementing a robust panic recovery mechanism, the system can now gracefully handle unexpected runtime errors in critical goroutines, preventing service interruptions. The recovery process includes logging detailed stack traces and ensuring proper resource cleanup, which will be invaluable for diagnosing and fixing underlying issues. This change represents a proactive step towards a more stable and resilient network operation.

Highlights

Panic Recovery for Gossip Goroutines: Introduced a centralized panic recovery mechanism specifically for gossip goroutines to enhance the robustness of message processing.
Increased System Resilience: The new mechanism allows the system to continue operating even when logic errors lead to panics in gossip message processing, preventing crashes.
Improved Debugging and Resource Management: Upon panic, the system ensures that any dependencies are freed and logs the full panic trace, aiding in future bug identification and resolution.
Defensive Programming Pattern: The author suggests this pattern as a valuable defensive measure for other p2p-facing sub-systems within the daemon.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable panic recovery mechanism for gossip message processing, significantly enhancing the daemon's robustness. The implementation is sound, and the accompanying tests are thorough. My main suggestion is to refactor the logging within the new recoverGossipPanic function to use structured logging, aligning with the repository's style guide. This will improve log observability and consistency. I've provided detailed comments and code suggestions to facilitate this change.

discovery/gossiper.go

gemini-code-assist · 2025-12-30T03:31:58Z

discovery/gossiper_test.go

+				defer ctx.gossiper.recoverGossipPanic(
+					"testing", nMsg, &jobID,
+				)


To enable structured logging, the call to recoverGossipPanic needs to be updated to pass a background context.

Suggested change

defer ctx.gossiper.recoverGossipPanic(

"testing", nMsg, &jobID,

)

defer ctx.gossiper.recoverGossipPanic(

context.Background(), "testing", nMsg, &jobID,

)

References

The style guide (lines 235-253) mandates the use of structured logging (slog), which requires passing a context to logging functions like log.ErrorS. ^(link)

gemini-code-assist · 2025-12-30T03:31:58Z

discovery/gossiper_test.go

+		defer ctx.gossiper.recoverGossipPanic(
+			"testing", nMsg, &parentJobID,
+		)


To enable structured logging, the call to recoverGossipPanic needs to be updated to pass a background context.

Suggested change

defer ctx.gossiper.recoverGossipPanic(

"testing", nMsg, &parentJobID,

)

defer ctx.gossiper.recoverGossipPanic(

context.Background(), "testing", nMsg, &parentJobID,

)

References

The style guide (lines 235-253) mandates the use of structured logging (slog), which requires passing a context to logging functions like log.ErrorS. ^(link)

discovery/gossiper_test.go

lightninglabs-deploy · 2026-01-06T04:35:41Z

@Roasbeef, remember to re-request review from reviewers when ready

ziggie1984

Approach looks fine to me, had minor comments,

however we need to be aware that this recovery design has its limits, it will for example not recover panics which happen in second level goroutines. Most of our DB insertions happen in the batch function which is a separate goroutine so we won't catch these errrors for example:

 handleNetworkMessages (Goroutine 1)
    ├─ defer recoverGossipPanic()  ← Can only catch panics in THIS goroutine
    │
    ├─ processNetworkAnnouncement()
    │   └─ handleChanAnnouncement()
    │       └─ d.cfg.Graph.AddEdge(ctx, edge, ops...)
    │           └─ b.addEdge(ctx, edge, op...)
    │               └─ b.cfg.Graph.AddChannelEdge(ctx, edge, op...)
    │                   └─ scheduler.Execute(ctx, request)
    │                       └─ go s.b.trigger(ctx)  ← NEW GOROUTINE! (batch/scheduler.go:85)
    │                           └─ b.run(ctx)       ← NO PANIC RECOVERY!
    │                               └─ req.Do(tx)   ← 💥 If panic here → DAEMON CRASHES!
    │
    └─ recoverGossipPanic CANNOT catch the panic in the batch goroutine!

discovery/gossiper.go

discovery/gossiper_test.go

docs/release-notes/release-notes-0.21.0.md

Roasbeef · 2026-01-07T20:05:15Z

however we need to be aware that this recovery design has its limits, it will for example not recover panics which happen in second level goroutines

That's a good point. I think to cover deeper chains like that we would consider wrapping our existing wg wrapper with a recover call. Here I'm after just shallow items within the gossip processing logic, will take a look at if things need to be extended slightly more.

In this commit, we add a centralized panic recovery mechanism for gossip goroutines. This increases the robustness of message processing in the gossiper, as now we are able to keep on trucking in the face of logic errors that may lead to panics. We ensure that any deps are freed and we log the panic trace to help catch bugs in the future.

In this commit, we extend the panic recovery mechanism to cover the serial processing path for AnnounceSignatures1 messages. Unlike other gossip messages which are processed in parallel goroutines, announcement signatures are processed serially in the main networkHandler loop. A panic during this serial processing would previously crash the entire gossiper. This change wraps the processing in an anonymous function with a deferred panic recovery, ensuring resilience without changing the serial processing semantics. Since AnnounceSignatures bypass the validation barrier, we pass nil for the jobID parameter.

ziggie1984

LGTM

Roasbeef added discovery Peer and route discovery / whisper protocol related issues/PRs healthcheck labels Dec 30, 2025

Roasbeef added this to the v0.20.1 milestone Dec 30, 2025

gemini-code-assist bot reviewed Dec 30, 2025

View reviewed changes

saubyk assigned Roasbeef Jan 4, 2026

saubyk added this to lnd v0.20 Jan 4, 2026

saubyk moved this to In progress in lnd v0.20 Jan 4, 2026

ziggie1984 added no-changelog backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. labels Jan 5, 2026

ziggie1984 self-requested a review January 5, 2026 20:08

saubyk requested review from ellemouton and gijswijs and removed request for ellemouton January 6, 2026 17:38

Roasbeef force-pushed the discovery-panic-recovery branch from db8a839 to b8bd1aa Compare January 7, 2026 02:18

ziggie1984 reviewed Jan 7, 2026

View reviewed changes

Roasbeef added 3 commits January 7, 2026 12:35

docs/release-notes: add release notes

7f54408

Roasbeef force-pushed the discovery-panic-recovery branch from b8bd1aa to 7f54408 Compare January 7, 2026 20:35

Roasbeef requested a review from ziggie1984 January 7, 2026 20:38

ziggie1984 approved these changes Jan 7, 2026

View reviewed changes

saubyk moved this from In progress to In review in lnd v0.20 Jan 8, 2026

discovery: add panic recovery for gossip message processing #10470

Are you sure you want to change the base?

discovery: add panic recovery for gossip message processing #10470

Conversation

Roasbeef commented Dec 30, 2025

Uh oh!

gemini-code-assist bot commented Dec 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lightninglabs-deploy commented Jan 6, 2026

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Roasbeef commented Jan 7, 2026

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants