Skip to content

[action] [PR:104] [dhcp4relay] Fix bug: DHCP relay broken after "config vrf bind <Vlan> <Vrf>" (socket not rebound to new VRF)#107

Merged
mssonicbld merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/104
Apr 28, 2026
Merged

[action] [PR:104] [dhcp4relay] Fix bug: DHCP relay broken after "config vrf bind <Vlan> <Vrf>" (socket not rebound to new VRF)#107
mssonicbld merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/104

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Why I did it

On a SONiC device with dhcp_relay running on a VLAN, moving that VLAN to a non-default VRF at runtime via config vrf bind <Vlan> <Vrf> silently breaks DHCP: the relay's upstream socket stays bound to the original VRF, so OFFER/ACK never reach clients. The bug is silent — no log, no counter, no restart. Restarting dhcp_relay (or config reload) papers
over it because the startup path uses the correct CONFIG_DB lookup. Bug has been latent since PR #67 / #84.

How I did it

process_vlan_interface_notification() in dhcp4relay_mgr.cpp reads field "vrf" from the VLAN_INTERFACE update, but the schema field is "vrf_name" (the same module's startup path uses the correct name). The mismatch leaves msg->vrf empty, the consumer in dhcp4relay.cpp short-circuits at if (msg->vrf.empty()) return;, and
handle_server_sock() — which would setsockopt(SO_BINDTODEVICE, vrf) — never runs.

Fix: read VRF_NAME_FIELD instead of the typo, and pull the five reused CONFIG_DB field-name literals (vrf_name, server_vrf, source_interface, link_selection, state) into named macros in dhcp4relay.h so this typo class cannot recur silently. The IP-suffix branch is left untouched — bare-key event drives the rebind.

How to verify it

sonic-mgmt tests/dhcp_relay/test_dhcpv4_relay.py::test_dhcp_relay_with_non_default_vrf (4 cases). Without the fix all 4 fail with PTF expected 48, got 0. With the fix applied to the on-DUT dhcp_relay deb, all 4 pass; the test-side restart_dhcp_service workaround can be removed.

Signed-off-by: Sonic Build Admin [email protected]

… <Vrf>" (socket not rebound to new VRF)

 ### Why I did it
 On a SONiC device with dhcp_relay running on a VLAN, moving that VLAN to a non-default VRF at runtime via `config vrf bind <Vlan> <Vrf>` silently breaks DHCP: the relay's upstream socket stays bound to the original VRF, so OFFER/ACK never reach clients. The bug is silent — no log, no counter, no restart. Restarting dhcp_relay (or `config reload`) papers
over it because the startup path uses the correct CONFIG_DB lookup. Bug has been latent since PR sonic-net#67 / sonic-net#84.

 ### How I did it
 `process_vlan_interface_notification()` in `dhcp4relay_mgr.cpp` reads field `"vrf"` from the `VLAN_INTERFACE` update, but the schema field is `"vrf_name"` (the same module's startup path uses the correct name). The mismatch leaves `msg->vrf` empty, the consumer in `dhcp4relay.cpp` short-circuits at `if (msg->vrf.empty()) return;`, and
`handle_server_sock()` — which would `setsockopt(SO_BINDTODEVICE, vrf)` — never runs.

 Fix: read `VRF_NAME_FIELD` instead of the typo, and pull the five reused CONFIG_DB field-name literals (`vrf_name`, `server_vrf`, `source_interface`, `link_selection`, `state`) into named macros in `dhcp4relay.h` so this typo class cannot recur silently. The IP-suffix branch is left untouched — bare-key event drives the rebind.

 ### How to verify it
 sonic-mgmt `tests/dhcp_relay/test_dhcpv4_relay.py::test_dhcp_relay_with_non_default_vrf` (4 cases). Without the fix all 4 fail with PTF `expected 48, got 0`. With the fix applied to the on-DUT dhcp_relay deb, all 4 pass; the test-side `restart_dhcp_service` workaround can be removed.

Signed-off-by: Sonic Build Admin <[email protected]>
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #104

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 9a0780c into sonic-net:202511 Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant