Use dbus to detect completion of systemd resource start/stop actions #3818

nrwahl2 · 2025-02-05T12:10:18Z

No description provided.

nrwahl2 · 2025-02-05T12:13:02Z

@clumens This is an alternate version of #3805 to show an idea.

I HAVE NOT TESTED THIS AT ALL YET. All I know is that it compiles. I just got it put together and am going to bed. If it works (including with minor tweaks), great.

This PR also implements almost all of my review comments. One note: I dropped the "remove the timer for start/stop" part, but I didn't change the timeout handling at all otherwise. We had some discussion in #3805 about deficiencies in our current timeout handling.

lib/services/systemd.c

nrwahl2 · 2025-02-06T05:33:19Z

@clumens Testing has been very limited so far. I'm marking this ready for review, but bear that in mind.

lib/services/systemd.c

clumens

This looks fine to me, aside from still not having any confidence in how timeouts interact with waiting for systemd to tell us something has happened.

lib/services/systemd.c

clumens · 2025-02-10T18:37:01Z

One paperwork thing... I still come up as the author on the last two commits. That may be true on the subscribe commit, but I think the last commit is now both of us. You can add a Co-authored-by: line to make this a little more obvious (see https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors - I've been doing this on booth with commits that poki started and I modified).

clumens · 2025-02-10T19:17:18Z

lib/services/systemd.c

+        return DBUS_HANDLER_RESULT_NOT_YET_HANDLED;
+    }
+
+    if (!pcmk__str_eq(bus_path, action->opaque->job_path, pcmk__str_none)) {


I don't see how yet, but it's possible to get here with action being invalid - use after free perhaps?

Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd [150407] (pcmk_child_exit@pcmkd_subdaemons.c:258) error: pacemaker-execd[150410] terminated with signal 11 (Segmentation fault) Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd [150407] (pcmk__ipc_is_authentic_process_active@ipc_client.c:1576) info: Could not connect to lrmd IPC: Connection refused Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd [150407] (pcmk_process_exit@pcmkd_subdaemons.c:339) notice: Respawning subdaemon pacemaker-execd after unexpected exit

$ coredumpctl debug ... Stack trace of thread 150410: #0 0x00007f9121e30faf pcmk__str_eq (libcrmservice.so.53 + 0xafaf) #1 0x00007f9121bb32f5 dbus_connection_dispatch (libdbus-1.so.3 + 0x1b2f5) #2 0x00007f9121e2d8a8 dispatch_messages (libcrmservice.so.53 + 0x78a8) #3 0x00007f9121dcc257 mainloop_gio_callback (libcrmcommon.so.68 + 0x38257) #4 0x00007f9121c41f4f g_main_context_dispatch (libglib-2.0.so.0 + 0x54f4f) #5 0x00007f9121c97268 g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xaa268) #6 0x00007f9121c415a3 g_main_loop_run (libglib-2.0.so.0 + 0x545a3) #7 0x000056070b722511 main (pacemaker-execd + 0x4511) #8 0x00007f9121829590 __libc_start_call_main (libc.so.6 + 0x29590) #9 0x00007f9121829640 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29640) #10 0x000056070b7226d5 _start (pacemaker-execd + 0x46d5) ELF object binary architecture: AMD x86-64 ... (gdb) up #1 job_removed_filter (connection=<optimized out>, message=0x56070bb659e0, user_data=0x56070bb61d50) at /usr/src/debug/pacemaker-3.0.0-5068.b39b82278d.git.el9.x86_64/lib/services/systemd.c:877 877 if (!pcmk__str_eq(bus_path, action->opaque->job_path, pcmk__str_none)) { (gdb) p *action $1 = {id = 0x56070bb669b0 "y>", rsc = 0x7f91219face0 <main_arena+96> "൶\v\aV", action = 0x56070bb35f70 "E\361\303k\002V", interval_ms = 0, standard = 0x56070bb2cc90 "L\344\303k\002V", provider = 0x0, agent = 0x56070bb383d0 "\270w\302k\002V", timeout = 100000, params = 0x0, rc = 0, pid = 0, cancel = 0, status = 0, sequence = 3, expected_rc = 0, synchronous = 0, flags = 0, stderr_data = 0x0, stdout_data = 0x0, cb_data = 0x56070bb50540, opaque = 0x90} (gdb)

I think this is likely due to how I created the resource. I explicitly set the start timeout to less than it takes for the resource to be ready, so I think the action is getting finalized (and freed) by pacemaker before systemd gets around to telling us about the resource, so when the callback gets called the action has already been freed but is not NULL so we're still trying to access it.

I appreciate you tracking this down. One solution for this commit would be to have systemd_timeout_callback() remove the D-Bus filter for this action, if there is one. That would finalize the action. We could check whether op->opaque->job_path != NULL to make that decision.

Another commit could improve the timeout scheme. That could go in this PR, or not.

I'd prefer to handle timeouts in this PR as well, just so whoever ends up backporting this has fewer things to keep track of.

I can confirm no more segfaults with this config. My reproducer was to create the resource with pcs resource create TestSimple systemd:test-simple op start timeout=10.

nrwahl2 · 2025-02-19T04:45:50Z

Rebased on current main; no other changes in this push

nrwahl2 · 2025-02-19T04:52:20Z

Added Co-authored-by: lines to commit messages; no other changes

lib/services/systemd.c

nrwahl2 · 2025-02-19T11:22:39Z

Hopefully fixed this segfault: #3818 (comment)

nrwahl2 · 2025-02-19T11:35:47Z

Only thing left (besides testing, obviously) is timeout logic. Thoughts:

Should use the temporary unit override (see systemd_create_override()) to specify TimeoutStartSec, TimeoutStopSec, or JobTimeoutSec.
Can remove the timer right before kicking off the start/stop method call.
Can get time remaining via g_source_get_ready_time(...) - g_source_get_time(...), and use that to set the timeout in the override.
Would be nice to figure out why we add 5000 to the timeout; consider dropping.
It doesn't make sense to use op->timeout as the daemon-reload timeout in systemd_create_override() without subtracting anything from the op timeout.

clumens · 2025-02-19T17:24:58Z

* Should use the temporary unit override (see `systemd_create_override()`) to specify `TimeoutStartSec`, `TimeoutStopSec`, or `JobTimeoutSec`.

* Can remove the timer right before kicking off the start/stop method call.

* Can get time remaining via `g_source_get_ready_time(...) - g_source_get_time(...)`, and use that to set the timeout in the override.

In general, I'd prefer to use the systemd timeouts over our own for systemd services. We have so, so many timers around all this stuff and offloading it would be easier to understand, I think. Plus, I think using the systemd timers might fix up some of your past comments about eating away at timers with setup/teardown tasks.

* Would be nice to figure out why we add 5000 to the timeout; consider dropping.

Introduced without comment in fccd046. My hunch is that it's just some slop time to guarantee that pacemaker's timeout is long enough for us to tell systemd what to do, systemd getting around to executing it, the actual timeout value to pass, and then systemd getting around to notifying us about what it did.

nrwahl2 · 2025-02-19T21:13:27Z

In general, I'd prefer to use the systemd timeouts over our own for systemd services. We have so, so many timers around all this stuff and offloading it would be easier to understand, I think. Plus, I think using the systemd timers might fix up some of your past comments about eating away at timers with setup/teardown tasks.

The issue is that we still need to track the overall timeout. We do the LoadUnit and StartUnit method calls asynchronously. When they complete, that's when the systemd TimeoutStartSec would begin counting down. What if (for some bizarre reason) LoadUnit or StartUnit take a while or get stuck? That should count toward the resource action timeout. Granted, we likely have MUCH bigger problems if that happens.

So I was thinking we keep the Pacemaker timer until the start job is enqueued. To be most correct, we should enqueue it with the TimeoutStartSec set to whatever is left of the Pacemaker timer.

systemd job timeouts (JobTimeoutSec in the [Unit] section) are disabled by default.

Hopefully none of that will ever matter. I don't like to rely on hope.

Introduced without comment in fccd046. My hunch is that it's just some slop time to guarantee that pacemaker's timeout is long enough for us to tell systemd what to do, systemd getting around to executing it, the actual timeout value to pass, and then systemd getting around to notifying us about what it did.

Yep, sorry for making you track that down. I've got that in a comment in a WIP commit that I haven't pushed yet. That hunch makes sense and is my best guess as well. I don't love having arbitrary slop time.

Ref T25 Signed-off-by: Reid Wahl <[email protected]>

We will use this later for D-Bus signal message filtering. Ref T25 Signed-off-by: Reid Wahl <[email protected]>

We'll add a filter later. This does not meaningfully change behavior. Ref T25 Signed-off-by: Reid Wahl <[email protected]>

When systemd receives a StartUnit() or StopUnit() method call, it returns almost immediately, as soon as a start/stop job is enqueued. A successful return code does NOT indicate that the start/stop has finished. Previously, we worked around this in action_complete() with a hack that scheduled a follow-up monitor after a successful start/stop method call, which polled the service after 2 seconds to see whether it was actually running. However, this was not a robust solution. Timing issues could result in Pacemaker having an incorrect view of the resource's status or prematurely declaring the action as failed. Now, we follow the best practice as documented in the systemd D-Bus API doc (see StartUnit()): https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.systemd1.html#Methods After kicking off a systemd start/stop action, we make note of the job's D-Bus object path. Then we register a D-Bus message filter that looks for a JobRemoved signal whose bus path matches. This signal indicates that the job has completed and includes its result. When we find the matching signal, we set the action's result. We then remove the filter, which causes the action to be finalized and freed. In the case of the executor daemon, the action has a callback (action_complete()) that runs during finalization and sets the executor's view of the action result. Monitor actions still need much of the existing workaround code in action_complete(), so we keep it for now. We bail out for start/stop actions after setting the result as described above. Ref T25 Co-authored-by: Reid Wahl <[email protected]> Signed-off-by: Reid Wahl <[email protected]>

nrwahl2 · 2025-02-20T04:26:59Z

So I was thinking we keep the Pacemaker timer until the start job is enqueued. To be most correct, we should enqueue it with the TimeoutStartSec set to whatever is left of the Pacemaker timer.

I figured out how we would do this, but I think I've changed my mind (before I got too far along). It's still true that there are several asynchronous steps of variable duration before the start is enqueued. These steps occur after Pacemaker initiates the action. However, when a user configures an action timeout, they're not thinking about overhead like this. They are (or should be) thinking about how long the start/stop itself takes.

It feels nice when I can convince myself that the simpler approach is probably the better one. Let me know if you agree.

nrwahl2 · 2025-02-20T04:56:35Z

I figured out how to we would do this, but I think I've changed my mind (before I got too far along).

Sigh. I just remembered that we don't support JUST systemd services, but also sockets, mounts, timers, and paths. To me it seems silly... I struggle to imagine a use case for managing these directly that couldn't be better achieved in some other way. But whatever. Someone wanted them.

Added via #1508.

These are probably already broken to some extent, because our override files assume we're working with a service.

Other unit types don't support TimeoutStartSec and TimeoutStopSec. They support either TimeoutSec (which applies to both operations) or nothing in their unit type-specific sections.

I think our best bet for a portable override is to set JobRunningTimeoutSec in the [Unit] section. That will require setting the override reloading before both start and stop in case the timeouts differ, and unsetting the JobRunningTimeoutSec after the start/stop completes.

This still leaves the problem of creating an override file in the correct directory, whose syntax is valid for the unit type we're dealing with. The current one has a [Service] section.

nrwahl2 requested a review from clumens February 5, 2025 12:10

nrwahl2 mentioned this pull request Feb 5, 2025

Use dbus to tell when systemd has completed an action for us #3805

Closed

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch 2 times, most recently from 7b4a18d to d3e2f85 Compare February 5, 2025 12:19

clumens added the status: in progress PRs that aren't ready yet label Feb 5, 2025

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch 2 times, most recently from a31d3e5 to 002647f Compare February 6, 2025 01:03

nrwahl2 commented Feb 6, 2025

View reviewed changes

lib/services/systemd.c Show resolved Hide resolved

nrwahl2 marked this pull request as ready for review February 6, 2025 05:32

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 002647f to 108f582 Compare February 6, 2025 05:32

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch 2 times, most recently from ef1d140 to 12cebe0 Compare February 6, 2025 06:59

nrwahl2 commented Feb 6, 2025

View reviewed changes

lib/services/systemd.c Outdated Show resolved Hide resolved

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 12cebe0 to b39b822 Compare February 6, 2025 19:20

nrwahl2 changed the title ~~WIP: [ALT] Use dbus to tell when systemd has completed an action for us~~ [ALT] Use dbus to tell when systemd has completed an action for us Feb 6, 2025

nrwahl2 added needs attention PRs that someone needs to look at and removed status: in progress PRs that aren't ready yet labels Feb 6, 2025

nrwahl2 changed the title ~~[ALT] Use dbus to tell when systemd has completed an action for us~~ Use dbus to detect completed systemd unit start/stop actions Feb 8, 2025

nrwahl2 changed the title ~~Use dbus to detect completed systemd unit start/stop actions~~ Use dbus to detect completed systemd resource start/stop actions Feb 8, 2025

nrwahl2 changed the title ~~Use dbus to detect completed systemd resource start/stop actions~~ Use dbus to detect completion of systemd resource start/stop actions Feb 8, 2025

clumens reviewed Feb 10, 2025

View reviewed changes

lib/services/systemd.c Show resolved Hide resolved

clumens reviewed Feb 10, 2025

View reviewed changes

nrwahl2 marked this pull request as draft February 19, 2025 04:44

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from b39b822 to 7ab5c38 Compare February 19, 2025 04:45

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 7ab5c38 to 5e4d0b6 Compare February 19, 2025 04:49

nrwahl2 commented Feb 19, 2025

View reviewed changes

lib/services/systemd.c Show resolved Hide resolved

nrwahl2 commented Feb 19, 2025

View reviewed changes

lib/services/systemd.c Outdated Show resolved Hide resolved

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 5e4d0b6 to c4e3224 Compare February 19, 2025 11:21

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from c4e3224 to 77f3bea Compare February 19, 2025 11:25

nrwahl2 and others added 4 commits February 19, 2025 16:21

Refactor: libcrmservice: New services__copy_result() function

b527acc

Ref T25 Signed-off-by: Reid Wahl <[email protected]>

Refactor: libcrmservice: Store systemd job path in svc_action_private_t

877bdd7

We will use this later for D-Bus signal message filtering. Ref T25 Signed-off-by: Reid Wahl <[email protected]>

Refactor: libcrmservice: Subscribe to systemd D-Bus signals

31b2b03

We'll add a filter later. This does not meaningfully change behavior. Ref T25 Signed-off-by: Reid Wahl <[email protected]>

nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 77f3bea to dc58e11 Compare February 20, 2025 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dbus to detect completion of systemd resource start/stop actions #3818

Use dbus to detect completion of systemd resource start/stop actions #3818

nrwahl2 commented Feb 5, 2025

nrwahl2 commented Feb 5, 2025

nrwahl2 commented Feb 6, 2025

clumens left a comment

clumens commented Feb 10, 2025

clumens Feb 10, 2025

clumens Feb 10, 2025

nrwahl2 Feb 19, 2025

clumens Feb 19, 2025

clumens Feb 19, 2025

nrwahl2 commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025

clumens commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025 •

edited

Loading

nrwahl2 commented Feb 20, 2025 •

edited

Loading

nrwahl2 commented Feb 20, 2025

Use dbus to detect completion of systemd resource start/stop actions #3818

Are you sure you want to change the base?

Use dbus to detect completion of systemd resource start/stop actions #3818

Conversation

nrwahl2 commented Feb 5, 2025

nrwahl2 commented Feb 5, 2025

nrwahl2 commented Feb 6, 2025

clumens left a comment

Choose a reason for hiding this comment

clumens commented Feb 10, 2025

clumens Feb 10, 2025

Choose a reason for hiding this comment

clumens Feb 10, 2025

Choose a reason for hiding this comment

nrwahl2 Feb 19, 2025

Choose a reason for hiding this comment

clumens Feb 19, 2025

Choose a reason for hiding this comment

clumens Feb 19, 2025

Choose a reason for hiding this comment

nrwahl2 commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025

clumens commented Feb 19, 2025

nrwahl2 commented Feb 19, 2025 • edited Loading

nrwahl2 commented Feb 20, 2025 • edited Loading

nrwahl2 commented Feb 20, 2025

nrwahl2 commented Feb 19, 2025 •

edited

Loading

nrwahl2 commented Feb 20, 2025 •

edited

Loading