Skip to content

[dhcp4relay] Fix bug: DHCP relay broken after "config vrf bind <Vlan> <Vrf>" (socket not rebound to new VRF)#104

Open
Xichen96 wants to merge 1 commit intosonic-net:masterfrom
Xichen96:dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update
Open

[dhcp4relay] Fix bug: DHCP relay broken after "config vrf bind <Vlan> <Vrf>" (socket not rebound to new VRF)#104
Xichen96 wants to merge 1 commit intosonic-net:masterfrom
Xichen96:dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update

Conversation

@Xichen96
Copy link
Copy Markdown
Contributor

@Xichen96 Xichen96 commented Apr 25, 2026

Why I did it

On a SONiC device with dhcp_relay running on a VLAN, moving that VLAN to a non-default VRF at runtime via config vrf bind <Vlan> <Vrf> silently breaks DHCP: the relay's upstream socket stays bound to the original VRF, so OFFER/ACK never reach clients. The bug is silent — no log, no counter, no restart. Restarting dhcp_relay (or config reload) papers
over it because the startup path uses the correct CONFIG_DB lookup. Bug has been latent since PR #67 / #84.

How I did it

process_vlan_interface_notification() in dhcp4relay_mgr.cpp reads field "vrf" from the VLAN_INTERFACE update, but the schema field is "vrf_name" (the same module's startup path uses the correct name). The mismatch leaves msg->vrf empty, the consumer in dhcp4relay.cpp short-circuits at if (msg->vrf.empty()) return;, and
handle_server_sock() — which would setsockopt(SO_BINDTODEVICE, vrf) — never runs.

Fix: read VRF_NAME_FIELD instead of the typo, and pull the five reused CONFIG_DB field-name literals (vrf_name, server_vrf, source_interface, link_selection, state) into named macros in dhcp4relay.h so this typo class cannot recur silently. The IP-suffix branch is left untouched — bare-key event drives the rebind.

How to verify it

sonic-mgmt tests/dhcp_relay/test_dhcpv4_relay.py::test_dhcp_relay_with_non_default_vrf (4 cases). Without the fix all 4 fail with PTF expected 48, got 0. With the fix applied to the on-DUT dhcp_relay deb, all 4 pass; the test-side restart_dhcp_service workaround can be removed.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 25, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Copilot / name: Copilot (20f28b1)
  • ✅ login: Xichen96 / name: Xichen96 (20f28b1)

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update branch from bac7808 to 88f23cc Compare April 25, 2026 09:32
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update branch from 88f23cc to d11235d Compare April 25, 2026 09:32
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update branch from d11235d to fab233f Compare April 25, 2026 09:33
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 changed the title [dhcp4relay] Fix VRF update notification; centralize CONFIG_DB field … [dhcp4relay] Fix bug: DHCP relay broken after "config vrf bind <Vlan> <Vrf>" (socket not rebound to new VRF) Apr 25, 2026
@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update branch from fab233f to 7c9a551 Compare April 25, 2026 09:36
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

… <Vrf>" (socket not rebound to new VRF)

process_vlan_interface_notification() reads field "vrf" instead of
the schema's "vrf_name", so the runtime VRF-update message arrives
with an empty vrf, the consumer short-circuits, and the upstream
socket stays bound to the original VRF. Fix the typo and centralize
reused CONFIG_DB field names in dhcp4relay.h.

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xichen96 <[email protected]>
@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-dhcp4relay-vlan-interface-vrf-update branch from 7c9a551 to 20f28b1 Compare April 25, 2026 09:38
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants