Skip to content

202511: PMON restart fails causing Error "All critical services should be fully started!" while SONIC mgmt run #2254

@AnoopKamath

Description

@AnoopKamath

Problem: The test test_rendered_golden_config_override(and other similar config reboot tests) fails on trixie builds (202511) because the pmon service hits start-limit-hit and doesn't come back up after config reload -y -f /etc/sonic/golden_config_db.json. The test applies a golden config override and does a config reload, then waits up to 420 seconds for all critical services — pmon is the only one that reports False. All other services (snmp, lldp, bgp, swss, syncd, database, dhcp_relay, teamd) come up successfully.

Evidence: Live DUT testing on Cisco-8102-C64 — trixie build (systemd v257) fails pmon on the 4th stop/start cycle with start-limit-hit; bookworm build (systemd v252, same commit bf0cbb1 as trixie used) passes 5+ cycles. Both builds have identical service configs (StartLimitBurst=3, StartLimitIntervalSec=1200). Exit code from docker-rs wait is 0 for all services. Only pmon is affected — snmp and lldp survive 10+ cycles with the same StartLimitBurst=3.

Root Cause: The start-ratelimit counter for pmon accumulates across config reload cycles and is not reset on systemd v257 (Trixie). With StartLimitBurst=3, the 4th start attempt within the 20-minute window is refused. On systemd v252 (Bookworm), the counter resets during config reload, so the issue never occurs. Why only pmon triggers this (and not snmp/lldp with the same burst limit) is still under investigation.

Quick Fix

sudo mkdir -p /etc/systemd/system/pmon.service.d
echo -e "[Unit]\nStartLimitBurst=10" | sudo tee /etc/systemd/system/pmon.service.d/start_limit.conf
sudo systemctl daemon-reload && sudo systemctl reset-failed pmon

Seeking community input if this issue is known or fix available post systemd v255

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions