Skip to content

[orchagent] Honor createSwitchTimeout from sai.profile for all platforms#27761

Open
selvipal wants to merge 1 commit into
sonic-net:masterfrom
selvipal:orchagent-createswitchtimeout-generic
Open

[orchagent] Honor createSwitchTimeout from sai.profile for all platforms#27761
selvipal wants to merge 1 commit into
sonic-net:masterfrom
selvipal:orchagent-createswitchtimeout-generic

Conversation

@selvipal

@selvipal selvipal commented Jun 8, 2026

Copy link
Copy Markdown

Why I did it

On platforms with ASICs that have a long SAI create_switch time, create_switch can take longer than orchagent's default 60s create-switch timeout. When that happens, create_switch returns SAI_STATUS_FAILURE, orchagent aborts, and swss crash-loops on boot.

SONiC already supports a per-hwsku createSwitchTimeout knob in sai.profile, but orchagent.sh only consulted it inside a single vendor-specific branch. This change makes the knob platform-agnostic so any vendor whose hardware needs a longer initialization window can opt in, without adding a per-platform check to common code.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Reworked dockers/docker-orchagent/orchagent.sh to read createSwitchTimeout from the hwsku sai.profile for all platforms and pass it to orchagent as -t <seconds>. Removed the vendor-specific block and added a single platform-agnostic block after the per-platform MAC handling.

A vendor opts in purely by defining createSwitchTimeout in its sai.profile; if the key is absent, behavior is unchanged (orchagent default timeout). Any platform that previously defined the key in its sai.profile continues to work exactly as before.

How to verify it

Add createSwitchTimeout=<seconds> to the hwsku sai.profile and restart swss (config reload / systemctl restart swss). Verify:

  • ps -o args= -C orchagent shows ... -t <seconds> ...
  • syslog: orchagent: setRedisExtensionAttribute: set response timeout to <ms> ms

If the key is not present, orchagent starts without -t (default), confirming the change is opt-in and non-intrusive for other platforms.

Verified on hardware whose SAI create_switch exceeds the default 60s timeout:

Scenario createSwitchTimeout create duration result
long create, timeout set 120 ~71s switch created, orchagent OK
long create, no timeout unset (60s default) >60s SAI_STATUS_FAILURE -> orchagent abort/crash
short create, no timeout unset (60s default) ~38s switch created, orchagent OK

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Description for the changelog

[orchagent] Honor createSwitchTimeout from sai.profile for all platforms (not just a single vendor branch), so platforms with long SAI create_switch times can extend orchagent's create-switch timeout via sai.profile.

Link to config_db schema for YANG module changes

N/A

A picture of a cute animal (not mandatory but encouraged)

@selvipal selvipal requested a review from lguohan as a code owner June 8, 2026 22:30
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

orchagent.sh only consulted createSwitchTimeout (passed to orchagent as
-t) inside a single vendor-specific branch. Make it platform-agnostic:
read createSwitchTimeout from the hwsku sai.profile for every platform
and pass it to orchagent when present. A vendor opts in simply by
defining createSwitchTimeout in its sai.profile; when the key is absent,
behavior is unchanged (orchagent default timeout). Any platform that
previously defined the key in its sai.profile continues to work exactly
as before.

This is needed on platforms whose SAI create_switch exceeds orchagent's
default 60s timeout (for example, ASICs that perform HBM DRAM training
during initialization), where the timeout otherwise causes
SAI_STATUS_FAILURE and an orchagent abort (swss crash) on boot.

Signed-off-by: selvipal <selvipal@cisco.com>
@selvipal selvipal force-pushed the orchagent-createswitchtimeout-generic branch from e44f52e to ce74474 Compare June 8, 2026 23:16
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants