Skip to content

Conversation

@lixmal
Copy link
Collaborator

@lixmal lixmal commented Nov 15, 2025

Describe your changes

  • Use the ping binary if we cannot create raw sockets
  • Bump gvisor/netstack to fix duplicate echo replies (this also bumps the min Go version)
  • Bump wireguard-go fork to work with the new gvisor
  • Adjust forwarder to work with new netstack code
  • Adjust ice_bind to work with new wireguard-go code
  • Fix a bug where waiting for echo replies would block all other packets

Depends on netbirdio/wireguard-go#12

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • ICMP capability detection with raw-socket probe and a rate-limited ping fallback.
    • New logger variant accepting four formatted arguments.
  • Refactor

    • MTU access made thread-safe.
    • ICMP handling reworked into async socket and ping-based paths with centralized reply injection.
    • UDP handling now reports explicit success/failure.
    • Receiver creation generalized to support non-IPv4 batch readers.
  • Chores

    • Bumped Go toolchain, many dependency updates, CI workflow updates; increased Wasm size limit.
  • Bug Fixes

    • Stabilized error formatting in validation and DNS logging paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

Walkthrough

Atomic MTU added to endpoint; Forwarder probes raw‑ICMP access, adds ping fallback and semaphore rate limiting; ICMP echo handling refactored into socket and ping paths with helpers; UDP handler now returns bool; logger gains Warn4; receiver creation generalized; broad dependency and CI Go toolchain upgrades; minor error-formatting fixes.

Changes

Cohort / File(s) Summary
Endpoint thread-safety
client/firewall/uspfilter/forwarder/endpoint.go
mtu changed from uint32atomic.Uint32; MTU() uses Load(). Added Close(), SetLinkAddress(tcpip.LinkAddress), SetMTU(mtu uint32), SetOnCloseAction(func()) (mostly no-ops except atomic store for MTU).
ICMP refactor & forwarder changes
client/firewall/uspfilter/forwarder/forwarder.go, client/firewall/uspfilter/forwarder/icmp.go
Forwarder adds hasRawICMPAccess bool and pingSemaphore chan struct{}; startup probes raw ICMP capability. ICMP handling split into handleICMPEcho, socket path (handleICMPViaSocket/forwardICMPPacket) and ping fallback (handleICMPViaPing/buildPingCommand/synthesizeEchoReply/injectICMPReply); signatures and packet pointer usage adjusted.
UDP control-flow change
client/firewall/uspfilter/forwarder/udp.go
handleUDP now returns bool. Early exits and success paths updated to return booleans. gonet.NewUDPConn call adjusted (removed f.stack arg).
Logging enhancement
client/firewall/uspfilter/log/log.go
Added Warn4(format string, arg1, arg2, arg3, arg4 any) for non-blocking 4-argument warn-level logging.
Receiver creation generalized
client/iface/bind/ice_bind.go
CreateIPv4ReceiverFn replaced by CreateReceiverFn(pc wgConn.BatchReader, ...). If pc is *ipv4.PacketConn use IPv4 path; otherwise fallback receiver reads from UDP (ReadFromUDPAddrPort) and exposes a StdNetEndpoint.
Dependency & CI toolchain updates
go.mod, .github/workflows/*
Bumped Go toolchain to 1.24.x and many module upgrades (golang.org/x/*, gRPC, OpenTelemetry, gvisor, WireGuard, Google cloud libs, etc.). CI workflows updated to use Go 1.24.x across Darwin/Linux/Windows/lint/release/mobile/wasm jobs; Wasm size limit increased.
Minor error-formatting & tests
client/internal/dns/upstream.go, management/server/posture_checks.go, management/server/http/handlers/policies/posture_checks_handler_test.go
Replaced use of an error as a printf format with explicit "%s" formatting and adjusted a mock accordingly. No behavior changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client as ICMP Request
    participant Forwarder as Forwarder.handleICMP()
    participant Echo as handleICMPEcho()
    participant SocketPath as handleICMPViaSocket()
    participant PingPath as handleICMPViaPing()
    participant RawSock as Raw ICMP Socket
    participant PingCmd as Ping Command
    participant Injector as injectICMPReply()

    Client->>Forwarder: ICMP Echo Request
    Forwarder->>Forwarder: generate flowID & emit TypeStart
    Forwarder->>Echo: route to handleICMPEcho()
    alt hasRawICMPAccess
        Echo->>SocketPath: use raw-socket path
        SocketPath->>RawSock: forwardICMPPacket() send
        RawSock-->>SocketPath: receive reply
        SocketPath->>Injector: injectICMPReply()
        Injector-->>Client: Echo Reply
    else fallback
        Echo->>PingPath: use ping fallback (rate-limited via pingSemaphore)
        PingPath->>PingCmd: build & run ping
        PingCmd-->>PingPath: ping response/status
        PingPath->>Injector: synthesize & inject reply
        Injector-->>Client: Echo Reply (synthesized)
    end
    Forwarder->>Forwarder: emit TypeEnd
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing careful review:
    • client/firewall/uspfilter/forwarder/icmp.go (async flows, raw-socket handling, ping command construction, ICMP header synthesis and injection)
    • client/firewall/uspfilter/forwarder/forwarder.go (capability probe, semaphore init, concurrency/race concerns)
    • client/firewall/uspfilter/forwarder/udp.go and endpoint.go (changed return semantics and atomic MTU)
    • go.mod and CI workflow files (dependency bumps and toolchain alignment)

Suggested reviewers

  • crn4
  • mlsmaycon

Poem

🐰 I nibble bytes and count each hop,
Atomic MTU keeps packet drops from top.
Socket hums or ping I send,
Echoes bounce and then descend.
Carrots for CI — build and mend! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding non-root ICMP support to the userspace firewall forwarder.
Description check ✅ Passed The description covers key changes, includes dependency updates, notes a bug fix, and completes the provided template with checklist selections and documentation decision.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch forwarder-ping

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
client/firewall/uspfilter/forwarder/icmp.go (1)

57-65: Consider adding panic recovery to the goroutine.

If handleICMPViaSocket or handleICMPViaPing panic, the goroutine will crash silently. Consider adding a deferred panic recovery handler to log any unexpected panics.

Apply this diff to add panic recovery:

 		go func() {
 			defer func() { <-f.pingSemaphore }()
+			defer func() {
+				if r := recover(); r != nil {
+					f.logger.Error1("forwarder: Panic in ICMP echo handler: %v", r)
+				}
+			}()
 
 			if f.hasRawICMPAccess {
 				f.handleICMPViaSocket(flowID, id, icmpHdr, icmpData, rxBytes)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7d5cdc and 099870d.

📒 Files selected for processing (1)
  • client/firewall/uspfilter/forwarder/icmp.go (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4777
File: client/firewall/iptables/acl_linux.go:439-450
Timestamp: 2025-11-13T15:19:32.799Z
Learning: In the netbird client firewall iptables implementation (client/firewall/iptables/), IPv6 is not currently handled or supported.
🧬 Code graph analysis (1)
client/firewall/uspfilter/forwarder/icmp.go (1)
client/firewall/uspfilter/forwarder/forwarder.go (2)
  • Forwarder (35-49)
  • New (51-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Signal / Unit (amd64)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Signal / Unit (386)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: test-docker-compose (mysql)
  • GitHub Check: test-docker-compose (postgres)
  • GitHub Check: test-docker-compose (sqlite)
  • GitHub Check: test-getting-started-script
  • GitHub Check: Client / Unit
  • GitHub Check: Windows
  • GitHub Check: Linux
  • GitHub Check: Darwin
  • GitHub Check: release_ui
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: iOS / Build
  • GitHub Check: Client / Unit
  • GitHub Check: JS / Lint
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
🔇 Additional comments (8)
client/firewall/uspfilter/forwarder/icmp.go (8)

5-9: LGTM! Necessary imports for ping fallback.

The new imports support the ping binary fallback mechanism and platform-specific command construction.


20-48: LGTM! Clean separation of echo and non-echo ICMP handling.

The refactored logic properly delegates echo requests to asynchronous handling and forwards non-echo ICMP types when raw socket access is available.


73-99: LGTM! Proper resource management.

The function correctly creates a raw ICMP socket, sends the packet, and handles errors appropriately. The caller is responsible for closing the returned connection as documented.


125-141: LGTM! Proper timeout and buffer sizing.

The function correctly sets a read deadline and sizes the response buffer using the atomic MTU value.


233-245: LGTM! Correct ICMP echo reply synthesis.

The function properly creates an echo reply by copying the data, changing the type, and recalculating the checksum following the standard ICMP checksum algorithm.


247-272: LGTM! Proper IP header construction and injection.

The function correctly constructs an IPv4 header, wraps the ICMP payload, and bypasses the netstack to avoid re-processing by the ICMP handler. The TTL value of 64 is appropriate for local responses.


195-196: The Warn4 logger method exists and is properly implemented.

Verification confirms that Warn4 has been added to the Logger type at client/firewall/uspfilter/log/log.go:171 with the correct signature supporting four arguments. The method call at line 195-196 is valid.


212-231: The original review comment's reasoning is fundamentally incorrect.

The review claims -t is TTL on macOS/BSD, but macOS uses -t <timeout> to specify timeout in seconds and FreeBSD uses -t <timeout> for timeout in seconds. The current code is correct for both platforms.

However, there IS a real issue: OpenBSD uses -w <maxwait>, not -t (which is TTL on OpenBSD), and NetBSD uses -w <deadline> for timeout. The current code incorrectly groups all three BSD variants together on line 225, which fails for OpenBSD and NetBSD.

The correct fix requires splitting BSD platforms:

  • FreeBSD: keep -t
  • OpenBSD: change to -w
  • NetBSD: change to -w

The original review's suggested fix is partially correct by accident but based on false premises.

Likely an incorrect or invalid review comment.

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
client/firewall/uspfilter/forwarder/icmp.go (2)

101-123: Send TypeEnd event on error to complete flow tracking.

When forwardICMPPacket fails at line 107, the function returns early without sending a TypeEnd event. This leaves the flow incomplete with only a TypeStart event.

Apply this diff to send an end event on error:

 	conn, err := f.forwardICMPPacket(id, payload, icmpHdr, 5*time.Second)
 	if err != nil {
 		f.logger.Error2("forwarder: Failed to send ICMP packet for %v: %v", epID(id), err)
+		f.sendICMPEvent(nftypes.TypeEnd, flowID, id, uint8(icmpHdr.Type()), uint8(icmpHdr.Code()), uint64(rxBytes), 0)
 		return
 	}

184-210: Send TypeEnd event on error to complete flow tracking.

When the ping binary fails at line 194, the function returns without sending a TypeEnd event, leaving the flow incomplete.

Apply this diff to send an end event on error:

 	pingStart := time.Now()
 	if err := cmd.Run(); err != nil {
 		f.logger.Warn4("forwarder: Ping binary failed for %v type %v code %v: %v", epID(id),
 			icmpHdr.Type(), icmpHdr.Code(), err)
+		f.sendICMPEvent(nftypes.TypeEnd, flowID, id, uint8(icmpHdr.Type()), uint8(icmpHdr.Code()), uint64(rxBytes), 0)
 		return
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 099870d and ca21a98.

📒 Files selected for processing (2)
  • client/firewall/uspfilter/forwarder/icmp.go (4 hunks)
  • client/firewall/uspfilter/forwarder/udp.go (8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4777
File: client/firewall/iptables/acl_linux.go:439-450
Timestamp: 2025-11-13T15:19:32.799Z
Learning: In the netbird client firewall iptables implementation (client/firewall/iptables/), IPv6 is not currently handled or supported.
🧬 Code graph analysis (2)
client/firewall/uspfilter/forwarder/udp.go (1)
client/firewall/uspfilter/forwarder/forwarder.go (1)
  • Forwarder (35-49)
client/firewall/uspfilter/forwarder/icmp.go (1)
client/firewall/uspfilter/forwarder/forwarder.go (2)
  • Forwarder (35-49)
  • New (51-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Signal / Unit (amd64)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Signal / Unit (386)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: release_ui_darwin
  • GitHub Check: release_ui
  • GitHub Check: release
  • GitHub Check: Client / Unit
  • GitHub Check: Android / Build
  • GitHub Check: test-docker-compose (sqlite)
  • GitHub Check: test-docker-compose (mysql)
  • GitHub Check: test-getting-started-script
  • GitHub Check: test-docker-compose (postgres)
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Windows
🔇 Additional comments (4)
client/firewall/uspfilter/forwarder/udp.go (2)

352-354: Good addition of io.EOF to closed-error detection.

Treating io.EOF as a terminal condition in isClosedError is appropriate, as EOF signals a closed connection and should not be logged as an unexpected error.


180-180: Signature change verified and correct.

The web search confirms that gonet.NewUDPConn in the upgraded gvisor/netstack has signature func NewUDPConn(wq *waiter.Queue, ep tcpip.Endpoint) *UDPConn. The code at line 180 (gonet.NewUDPConn(&wq, ep)) correctly matches this signature, with the stack parameter properly removed as part of the netstack upgrade.

client/firewall/uspfilter/forwarder/icmp.go (2)

247-272: Good implementation of ICMP reply injection.

The injectICMPReply method correctly:

  • Constructs a proper IPv4 header with appropriate fields
  • Calculates and sets the IP checksum
  • Bypasses netstack to avoid looping through the ICMP handler
  • Returns the injected size for flow tracking

212-231: I need to verify the flags across all platforms mentioned. The web search already revealed important information about macOS that contradicts the review comment. Let me search for the other platforms:

Based on my verification, the review comment contains fundamental inaccuracies regarding the macOS ping flags.

The review comment is incorrect and should be disregarded.

Evidence from official documentation:

  • Linux/Android: -W specifies time to wait for a response in seconds, with -w setting the timeout for the entire program session. The code correctly uses -W for timeout.

  • FreeBSD/OpenBSD/NetBSD: -t specifies a timeout in seconds before ping exits. The code correctly uses -t for timeout on these platforms.

  • Windows: -w sets timeout in milliseconds. The code correctly multiplies timeoutSec by 1000 to convert to milliseconds.

  • macOS/iOS: The review's claim is incorrect. From the earlier web search, -t timeout on macOS is "overall timeout in seconds before ping exits regardless of packets received." The code correctly uses -t with timeout in seconds. The review mistakenly claims -t is TTL and suggests using -W, but on macOS, -W uses milliseconds, not seconds, which would be inconsistent with the code's handling of seconds.

The platform-specific ping flags in the code are correct and consistent with official documentation across all platforms.

Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants