You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently came across this issue recently where I can see when pmcd.service fails with exit status 2, It stalls few systemd units and makes multi-user.target and other potentially important dependencies to stuck.
Simple Reproducer:
Just empty the pmcd.conf to simulate issue with pmcd.service and reboot system
System will boot fine without issue and all services will be up without issue and everything seems to be fine but below issue will go unnoticed in most of the cases:
[root@rhel94 ~]# systemctl list-jobs
JOB UNIT TYPE STATE
7756 pmlogger_farm.service start waiting
7669 pmlogger.service start waiting
135 multi-user.target start waiting
272 systemd-update-utmp-runlevel.service start waiting
7504 pmcd.service start running
5 jobs listed.
[root@rhel94 ~]# runlevel
unknown
[root@rhel94 ~]#
Journal will throw below error:
-- Boot f535e9965e31455fac39b9fb35c7806b --
Jan 31 10:55:21 rhel94.static systemd[1]: Starting Performance Metrics Collector Daemon...
Jan 31 10:56:27 rhel94.static systemd[1]: pmcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 31 10:56:27 rhel94.static systemd[1]: pmcd.service: Failed with result 'exit-code'.
Jan 31 10:56:27 rhel94.static systemd[1]: Failed to start Performance Metrics Collector Daemon.
Jan 31 10:56:27 rhel94.static systemd[1]: pmcd.service: Consumed 1.094s CPU time.
Jan 31 10:56:27 rhel94.static systemd[1]: pmcd.service: Scheduled restart job, restart counter is at 1.
Jan 31 10:56:27 rhel94.static systemd[1]: Stopped Performance Metrics Collector Daemon.
Jan 31 10:56:27 rhel94.static systemd[1]: pmcd.service: Consumed 1.094s CPU time.
Jan 31 10:56:27 rhel94.static systemd[1]: Starting Performance Metrics Collector Daemon...
Jan 31 10:57:29 rhel94.static systemd[1]: pmcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 31 10:57:29 rhel94.static systemd[1]: pmcd.service: Failed with result 'exit-code'.
Jan 31 10:57:29 rhel94.static systemd[1]: Failed to start Performance Metrics Collector Daemon.
Jan 31 10:57:29 rhel94.static systemd[1]: pmcd.service: Consumed 1.031s CPU time.
Jan 31 10:57:29 rhel94.static systemd[1]: pmcd.service: Scheduled restart job, restart counter is at 2.
Jan 31 10:57:29 rhel94.static systemd[1]: Stopped Performance Metrics Collector Daemon.
Jan 31 10:57:29 rhel94.static systemd[1]: pmcd.service: Consumed 1.031s CPU time.
Jan 31 10:57:29 rhel94.static systemd[1]: Starting Performance Metrics Collector Daemon...
Jan 31 10:58:30 rhel94.static root[5223]: pmcd_wait failed in /usr/libexec/pcp/lib/pmcd: exit status: 2
Jan 31 10:58:30 rhel94.static systemd[1]: pmcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 31 10:58:30 rhel94.static systemd[1]: pmcd.service: Failed with result 'exit-code'.
Jan 31 10:58:30 rhel94.static systemd[1]: Failed to start Performance Metrics Collector Daemon.
Jan 31 10:58:30 rhel94.static systemd[1]: pmcd.service: Scheduled restart job, restart counter is at 3.
Jan 31 10:58:30 rhel94.static systemd[1]: Stopped Performance Metrics Collector Daemon.
Jan 31 10:58:30 rhel94.static systemd[1]: Starting Performance Metrics Collector Daemon...
And this restart counter thingy will go on and on (I guess forever) halting all other dependent targets like multi-user.target
On Vanilla Redhat installation this does not looks very impacting but where there are custom and important services which might start after multi-user.target it might have big impact.
Obviously the reproducer used above is just an way to make pmcd fail but it might fail with other reasons as well.
We can think of any other options as well like on-abnormal which will only restart on unclean signal, timeouts and watchdog making this issue occur less frequency (but not solve).
When tested with Restart=no it will stall the multi-user.target for few moment and then as it fails to activate service, Systemd will move ahead with activation of dependencies.
Hi Folks,
I recently came across this issue recently where I can see when
pmcd.service
fails with exitstatus 2
, It stalls few systemd units and makesmulti-user.target
and other potentially important dependencies to stuck.Simple Reproducer:
Just empty the
pmcd.conf
to simulate issue withpmcd.service
andreboot
systemSystem will boot fine without issue and all services will be up without issue and everything seems to be fine but below issue will go unnoticed in most of the cases:
Journal will throw below error:
And this restart counter thingy will go on and on (I guess forever) halting all other dependent targets like
multi-user.target
On Vanilla Redhat installation this does not looks very impacting but where there are custom and important services which might start after
multi-user.target
it might have big impact.Obviously the reproducer used above is just an way to make pmcd fail but it might fail with other reasons as well.
We can think of any other options as well like
on-abnormal
which will only restart on unclean signal, timeouts and watchdog making this issue occur less frequency (but not solve).When tested with
Restart=no
it will stall themulti-user.target
for few moment and then as it fails to activate service, Systemd will move ahead with activation of dependencies.Edit: This seems to be the bugzilla where we added restart option: https://bugzilla.redhat.com/show_bug.cgi?id=1365658
The text was updated successfully, but these errors were encountered: