-
Notifications
You must be signed in to change notification settings - Fork 744
Module graceful shutdown support #4031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Module graceful shutdown support #4031
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@rameshraghupathy Could you please update PR description more in detail of the code changes being done? |
@rameshraghupathy The table being updated from chassisd is CHASSIS_MODULE_TABLE (and not CHASSIS_MODULE_INFO_TABLE) is that implementation also being changed? |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
b3a3a4b
to
f25aa22
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Provide support for SmartSwitch DPU module graceful shutdown.
Description:
Single source of truth for transitions
All components now use
sonic_platform_base.module_base.ModuleBase
helpers:set_module_state_transition(db, name, transition_type)
clear_module_state_transition(db, name)
get_module_state_transition(db, name) -> dict
is_module_state_transition_timed_out(db, name, timeout_secs) -> bool
Eliminates duplicated logic and race-prone direct Redis writes.
Correct table everywhere
CHASSIS_MODULE_TABLE
(replacesCHASSIS_MODULE_INFO_TABLE
).Ownership & lifecycle
The initiator of an operation (
startup
/shutdown
/reboot
) sets:state_transition_in_progress=True
transition_type=<op>
transition_start_time=<utc-iso8601>
The platform (
set_admin_state()
) is responsible for clearing:state_transition_in_progress=False
transition_end_time=<epoch>
(or similar end stamp).CLI pre-clears only when a prior transition is timed out.
Timeouts & policy
Platform JSON path only:
/usr/share/sonic/device/{plat}/platform.json
; else constants.Typical production values used:
startup: 180s
,shutdown: 180s
(≈graceful_wait 60s + power 120s
),reboot: 120s
.Graceful wait (e.g., waiting for “Graceful shutdown complete”) is a platform policy and implemented inside platform
set_admin_state()
—not in ModuleBase.Boot behavior
chassisd
on start:set_initial_dpu_admin_state()
which marks transitions via ModuleBase before calling platformset_admin_state()
.gNOI shutdown daemon
Listens on
CHASSIS_MODULE_TABLE
and triggers only when:state_transition_in_progress=True
andtransition_type=shutdown
.Never clears the flag (ownership stays with the platform).
Bounded RPC timeouts and robust Redis access (swsssdk/swsscommon).
CLI (
config chassis modules …
)is_module_state_transition_timed_out()
→ auto-clear then proceed.startup
/shutdown
; platform clears on completion.Redis robustness
hset(mapping=...)
usage.Race reduction & consistency
transition_start_time
; clears may add an end stamp.Change scope
HLD: # 1991 sonic-net/SONiC#1991
sonic-platform-common: #567 sonic-net/sonic-platform-common#567
sonic-host-services: sonic-net/sonic-host-services#255
sonic-platform-daemons: sonic-net/sonic-platform-daemons#667
How to verify it
Issue the "config chassis modules shutdown DPUx" command
Verify the DPU module is gracefully shut by checking the logs in /var/log/syslog on both NPU and DPU