Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dbus to detect completion of systemd resource start/stop actions #3818

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

nrwahl2
Copy link
Contributor

@nrwahl2 nrwahl2 commented Feb 5, 2025

No description provided.

@nrwahl2 nrwahl2 requested a review from clumens February 5, 2025 12:10
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 5, 2025

@clumens This is an alternate version of #3805 to show an idea.

I HAVE NOT TESTED THIS AT ALL YET. All I know is that it compiles. I just got it put together and am going to bed. If it works (including with minor tweaks), great.

This PR also implements almost all of my review comments. One note: I dropped the "remove the timer for start/stop" part, but I didn't change the timeout handling at all otherwise. We had some discussion in #3805 about deficiencies in our current timeout handling.

@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch 2 times, most recently from 7b4a18d to d3e2f85 Compare February 5, 2025 12:19
@clumens clumens added the status: in progress PRs that aren't ready yet label Feb 5, 2025
@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch 2 times, most recently from a31d3e5 to 002647f Compare February 6, 2025 01:03
@nrwahl2 nrwahl2 marked this pull request as ready for review February 6, 2025 05:32
@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 002647f to 108f582 Compare February 6, 2025 05:32
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 6, 2025

@clumens Testing has been very limited so far. I'm marking this ready for review, but bear that in mind.

@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch 2 times, most recently from ef1d140 to 12cebe0 Compare February 6, 2025 06:59
@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 12cebe0 to b39b822 Compare February 6, 2025 19:20
@nrwahl2 nrwahl2 changed the title WIP: [ALT] Use dbus to tell when systemd has completed an action for us [ALT] Use dbus to tell when systemd has completed an action for us Feb 6, 2025
@nrwahl2 nrwahl2 added needs attention PRs that someone needs to look at and removed status: in progress PRs that aren't ready yet labels Feb 6, 2025
@nrwahl2 nrwahl2 changed the title [ALT] Use dbus to tell when systemd has completed an action for us Use dbus to detect completed systemd unit start/stop actions Feb 8, 2025
@nrwahl2 nrwahl2 changed the title Use dbus to detect completed systemd unit start/stop actions Use dbus to detect completed systemd resource start/stop actions Feb 8, 2025
@nrwahl2 nrwahl2 changed the title Use dbus to detect completed systemd resource start/stop actions Use dbus to detect completion of systemd resource start/stop actions Feb 8, 2025
Copy link
Contributor

@clumens clumens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, aside from still not having any confidence in how timeouts interact with waiting for systemd to tell us something has happened.

@clumens
Copy link
Contributor

clumens commented Feb 10, 2025

One paperwork thing... I still come up as the author on the last two commits. That may be true on the subscribe commit, but I think the last commit is now both of us. You can add a Co-authored-by: line to make this a little more obvious (see https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors - I've been doing this on booth with commits that poki started and I modified).

return DBUS_HANDLER_RESULT_NOT_YET_HANDLED;
}

if (!pcmk__str_eq(bus_path, action->opaque->job_path, pcmk__str_none)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how yet, but it's possible to get here with action being invalid - use after free perhaps?

Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd          [150407] (pcmk_child_exit@pcmkd_subdaemons.c:258)       error: pacemaker-execd[150410] terminated with signal 11 (Segmentation fault)
Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd          [150407] (pcmk__ipc_is_authentic_process_active@ipc_client.c:1576)      info: Could not connect to lrmd IPC: Connection refused
Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd          [150407] (pcmk_process_exit@pcmkd_subdaemons.c:339)     notice: Respawning subdaemon pacemaker-execd after unexpected exit
$ coredumpctl debug
...
                Stack trace of thread 150410:
                #0  0x00007f9121e30faf pcmk__str_eq (libcrmservice.so.53 + 0xafaf)
                #1  0x00007f9121bb32f5 dbus_connection_dispatch (libdbus-1.so.3 + 0x1b2f5)
                #2  0x00007f9121e2d8a8 dispatch_messages (libcrmservice.so.53 + 0x78a8)
                #3  0x00007f9121dcc257 mainloop_gio_callback (libcrmcommon.so.68 + 0x38257)
                #4  0x00007f9121c41f4f g_main_context_dispatch (libglib-2.0.so.0 + 0x54f4f)
                #5  0x00007f9121c97268 g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xaa268)
                #6  0x00007f9121c415a3 g_main_loop_run (libglib-2.0.so.0 + 0x545a3)
                #7  0x000056070b722511 main (pacemaker-execd + 0x4511)
                #8  0x00007f9121829590 __libc_start_call_main (libc.so.6 + 0x29590)
                #9  0x00007f9121829640 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29640)
                #10 0x000056070b7226d5 _start (pacemaker-execd + 0x46d5)
                ELF object binary architecture: AMD x86-64
...
(gdb) up
#1  job_removed_filter (connection=<optimized out>, message=0x56070bb659e0, user_data=0x56070bb61d50) at /usr/src/debug/pacemaker-3.0.0-5068.b39b82278d.git.el9.x86_64/lib/services/systemd.c:877
877         if (!pcmk__str_eq(bus_path, action->opaque->job_path, pcmk__str_none)) {
(gdb) p *action
$1 = {id = 0x56070bb669b0 "y>", rsc = 0x7f91219face0 <main_arena+96> "൶\v\aV", action = 0x56070bb35f70 "E\361\303k\002V", interval_ms = 0, standard = 0x56070bb2cc90 "L\344\303k\002V", provider = 0x0, 
  agent = 0x56070bb383d0 "\270w\302k\002V", timeout = 100000, params = 0x0, rc = 0, pid = 0, cancel = 0, status = 0, sequence = 3, expected_rc = 0, synchronous = 0, flags = 0, stderr_data = 0x0, stdout_data = 0x0, 
  cb_data = 0x56070bb50540, opaque = 0x90}
(gdb) 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely due to how I created the resource. I explicitly set the start timeout to less than it takes for the resource to be ready, so I think the action is getting finalized (and freed) by pacemaker before systemd gets around to telling us about the resource, so when the callback gets called the action has already been freed but is not NULL so we're still trying to access it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate you tracking this down. One solution for this commit would be to have systemd_timeout_callback() remove the D-Bus filter for this action, if there is one. That would finalize the action. We could check whether op->opaque->job_path != NULL to make that decision.

Another commit could improve the timeout scheme. That could go in this PR, or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to handle timeouts in this PR as well, just so whoever ends up backporting this has fewer things to keep track of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm no more segfaults with this config. My reproducer was to create the resource with pcs resource create TestSimple systemd:test-simple op start timeout=10.

@nrwahl2 nrwahl2 marked this pull request as draft February 19, 2025 04:44
@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from b39b822 to 7ab5c38 Compare February 19, 2025 04:45
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2025

Rebased on current main; no other changes in this push

@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 7ab5c38 to 5e4d0b6 Compare February 19, 2025 04:49
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2025

Added Co-authored-by: lines to commit messages; no other changes

@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 5e4d0b6 to c4e3224 Compare February 19, 2025 11:21
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2025

Hopefully fixed this segfault: #3818 (comment)

@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from c4e3224 to 77f3bea Compare February 19, 2025 11:25
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2025

Only thing left (besides testing, obviously) is timeout logic. Thoughts:

  • Should use the temporary unit override (see systemd_create_override()) to specify TimeoutStartSec, TimeoutStopSec, or JobTimeoutSec.
  • Can remove the timer right before kicking off the start/stop method call.
  • Can get time remaining via g_source_get_ready_time(...) - g_source_get_time(...), and use that to set the timeout in the override.
  • Would be nice to figure out why we add 5000 to the timeout; consider dropping.
  • It doesn't make sense to use op->timeout as the daemon-reload timeout in systemd_create_override() without subtracting anything from the op timeout.

@clumens
Copy link
Contributor

clumens commented Feb 19, 2025

* Should use the temporary unit override (see `systemd_create_override()`) to specify `TimeoutStartSec`, `TimeoutStopSec`, or `JobTimeoutSec`.

* Can remove the timer right before kicking off the start/stop method call.

* Can get time remaining via `g_source_get_ready_time(...) - g_source_get_time(...)`, and use that to set the timeout in the override.

In general, I'd prefer to use the systemd timeouts over our own for systemd services. We have so, so many timers around all this stuff and offloading it would be easier to understand, I think. Plus, I think using the systemd timers might fix up some of your past comments about eating away at timers with setup/teardown tasks.

* Would be nice to figure out why we add 5000 to the timeout; consider dropping.

Introduced without comment in fccd046. My hunch is that it's just some slop time to guarantee that pacemaker's timeout is long enough for us to tell systemd what to do, systemd getting around to executing it, the actual timeout value to pass, and then systemd getting around to notifying us about what it did.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 19, 2025

In general, I'd prefer to use the systemd timeouts over our own for systemd services. We have so, so many timers around all this stuff and offloading it would be easier to understand, I think. Plus, I think using the systemd timers might fix up some of your past comments about eating away at timers with setup/teardown tasks.

The issue is that we still need to track the overall timeout. We do the LoadUnit and StartUnit method calls asynchronously. When they complete, that's when the systemd TimeoutStartSec would begin counting down. What if (for some bizarre reason) LoadUnit or StartUnit take a while or get stuck? That should count toward the resource action timeout. Granted, we likely have MUCH bigger problems if that happens.

So I was thinking we keep the Pacemaker timer until the start job is enqueued. To be most correct, we should enqueue it with the TimeoutStartSec set to whatever is left of the Pacemaker timer.

systemd job timeouts (JobTimeoutSec in the [Unit] section) are disabled by default.

Hopefully none of that will ever matter. I don't like to rely on hope.

Introduced without comment in fccd046. My hunch is that it's just some slop time to guarantee that pacemaker's timeout is long enough for us to tell systemd what to do, systemd getting around to executing it, the actual timeout value to pass, and then systemd getting around to notifying us about what it did.

Yep, sorry for making you track that down. I've got that in a comment in a WIP commit that I haven't pushed yet. That hunch makes sense and is my best guess as well. I don't love having arbitrary slop time.

nrwahl2 and others added 4 commits February 19, 2025 16:21
We will use this later for D-Bus signal message filtering.

Ref T25

Signed-off-by: Reid Wahl <[email protected]>
We'll add a filter later. This does not meaningfully change behavior.

Ref T25

Signed-off-by: Reid Wahl <[email protected]>
When systemd receives a StartUnit() or StopUnit() method call, it
returns almost immediately, as soon as a start/stop job is enqueued. A
successful return code does NOT indicate that the start/stop has
finished.

Previously, we worked around this in action_complete() with a hack that
scheduled a follow-up monitor after a successful start/stop method call,
which polled the service after 2 seconds to see whether it was actually
running. However, this was not a robust solution. Timing issues could
result in Pacemaker having an incorrect view of the resource's status or
prematurely declaring the action as failed.

Now, we follow the best practice as documented in the systemd D-Bus API
doc (see StartUnit()):
https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.systemd1.html#Methods

After kicking off a systemd start/stop action, we make note of the job's
D-Bus object path. Then we register a D-Bus message filter that looks
for a JobRemoved signal whose bus path matches. This signal indicates
that the job has completed and includes its result. When we find the
matching signal, we set the action's result. We then remove the filter,
which causes the action to be finalized and freed. In the case of the
executor daemon, the action has a callback (action_complete()) that runs
during finalization and sets the executor's view of the action result.

Monitor actions still need much of the existing workaround code in
action_complete(), so we keep it for now. We bail out for start/stop
actions after setting the result as described above.

Ref T25

Co-authored-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
@nrwahl2 nrwahl2 force-pushed the nrwahl2-dbus-systemd branch from 77f3bea to dc58e11 Compare February 20, 2025 00:30
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 20, 2025

So I was thinking we keep the Pacemaker timer until the start job is enqueued. To be most correct, we should enqueue it with the TimeoutStartSec set to whatever is left of the Pacemaker timer.

I figured out how we would do this, but I think I've changed my mind (before I got too far along). It's still true that there are several asynchronous steps of variable duration before the start is enqueued. These steps occur after Pacemaker initiates the action. However, when a user configures an action timeout, they're not thinking about overhead like this. They are (or should be) thinking about how long the start/stop itself takes.

It feels nice when I can convince myself that the simpler approach is probably the better one. Let me know if you agree.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Feb 20, 2025

I figured out how to we would do this, but I think I've changed my mind (before I got too far along).

Sigh. I just remembered that we don't support JUST systemd services, but also sockets, mounts, timers, and paths. To me it seems silly... I struggle to imagine a use case for managing these directly that couldn't be better achieved in some other way. But whatever. Someone wanted them.

Added via #1508.

These are probably already broken to some extent, because our override files assume we're working with a service.

Other unit types don't support TimeoutStartSec and TimeoutStopSec. They support either TimeoutSec (which applies to both operations) or nothing in their unit type-specific sections.

I think our best bet for a portable override is to set JobRunningTimeoutSec in the [Unit] section. That will require setting the override reloading before both start and stop in case the timeouts differ, and unsetting the JobRunningTimeoutSec after the start/stop completes.

This still leaves the problem of creating an override file in the correct directory, whose syntax is valid for the unit type we're dealing with. The current one has a [Service] section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs attention PRs that someone needs to look at
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants