-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dbus to detect completion of systemd resource start/stop actions #3818
base: main
Are you sure you want to change the base?
Conversation
@clumens This is an alternate version of #3805 to show an idea. I HAVE NOT TESTED THIS AT ALL YET. All I know is that it compiles. I just got it put together and am going to bed. If it works (including with minor tweaks), great. This PR also implements almost all of my review comments. One note: I dropped the "remove the timer for start/stop" part, but I didn't change the timeout handling at all otherwise. We had some discussion in #3805 about deficiencies in our current timeout handling. |
7b4a18d
to
d3e2f85
Compare
a31d3e5
to
002647f
Compare
002647f
to
108f582
Compare
@clumens Testing has been very limited so far. I'm marking this ready for review, but bear that in mind. |
ef1d140
to
12cebe0
Compare
12cebe0
to
b39b822
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me, aside from still not having any confidence in how timeouts interact with waiting for systemd to tell us something has happened.
One paperwork thing... I still come up as the author on the last two commits. That may be true on the subscribe commit, but I think the last commit is now both of us. You can add a |
return DBUS_HANDLER_RESULT_NOT_YET_HANDLED; | ||
} | ||
|
||
if (!pcmk__str_eq(bus_path, action->opaque->job_path, pcmk__str_none)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how yet, but it's possible to get here with action being invalid - use after free perhaps?
Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd [150407] (pcmk_child_exit@pcmkd_subdaemons.c:258) error: pacemaker-execd[150410] terminated with signal 11 (Segmentation fault)
Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd [150407] (pcmk__ipc_is_authentic_process_active@ipc_client.c:1576) info: Could not connect to lrmd IPC: Connection refused
Feb 10 14:08:26.450 rhel9-scratch-1 pacemakerd [150407] (pcmk_process_exit@pcmkd_subdaemons.c:339) notice: Respawning subdaemon pacemaker-execd after unexpected exit
$ coredumpctl debug
...
Stack trace of thread 150410:
#0 0x00007f9121e30faf pcmk__str_eq (libcrmservice.so.53 + 0xafaf)
#1 0x00007f9121bb32f5 dbus_connection_dispatch (libdbus-1.so.3 + 0x1b2f5)
#2 0x00007f9121e2d8a8 dispatch_messages (libcrmservice.so.53 + 0x78a8)
#3 0x00007f9121dcc257 mainloop_gio_callback (libcrmcommon.so.68 + 0x38257)
#4 0x00007f9121c41f4f g_main_context_dispatch (libglib-2.0.so.0 + 0x54f4f)
#5 0x00007f9121c97268 g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xaa268)
#6 0x00007f9121c415a3 g_main_loop_run (libglib-2.0.so.0 + 0x545a3)
#7 0x000056070b722511 main (pacemaker-execd + 0x4511)
#8 0x00007f9121829590 __libc_start_call_main (libc.so.6 + 0x29590)
#9 0x00007f9121829640 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29640)
#10 0x000056070b7226d5 _start (pacemaker-execd + 0x46d5)
ELF object binary architecture: AMD x86-64
...
(gdb) up
#1 job_removed_filter (connection=<optimized out>, message=0x56070bb659e0, user_data=0x56070bb61d50) at /usr/src/debug/pacemaker-3.0.0-5068.b39b82278d.git.el9.x86_64/lib/services/systemd.c:877
877 if (!pcmk__str_eq(bus_path, action->opaque->job_path, pcmk__str_none)) {
(gdb) p *action
$1 = {id = 0x56070bb669b0 "y>", rsc = 0x7f91219face0 <main_arena+96> "൶\v\aV", action = 0x56070bb35f70 "E\361\303k\002V", interval_ms = 0, standard = 0x56070bb2cc90 "L\344\303k\002V", provider = 0x0,
agent = 0x56070bb383d0 "\270w\302k\002V", timeout = 100000, params = 0x0, rc = 0, pid = 0, cancel = 0, status = 0, sequence = 3, expected_rc = 0, synchronous = 0, flags = 0, stderr_data = 0x0, stdout_data = 0x0,
cb_data = 0x56070bb50540, opaque = 0x90}
(gdb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is likely due to how I created the resource. I explicitly set the start timeout to less than it takes for the resource to be ready, so I think the action is getting finalized (and freed) by pacemaker before systemd gets around to telling us about the resource, so when the callback gets called the action has already been freed but is not NULL so we're still trying to access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate you tracking this down. One solution for this commit would be to have systemd_timeout_callback()
remove the D-Bus filter for this action, if there is one. That would finalize the action. We could check whether op->opaque->job_path != NULL
to make that decision.
Another commit could improve the timeout scheme. That could go in this PR, or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to handle timeouts in this PR as well, just so whoever ends up backporting this has fewer things to keep track of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm no more segfaults with this config. My reproducer was to create the resource with pcs resource create TestSimple systemd:test-simple op start timeout=10
.
b39b822
to
7ab5c38
Compare
Rebased on current main; no other changes in this push |
7ab5c38
to
5e4d0b6
Compare
Added |
5e4d0b6
to
c4e3224
Compare
Hopefully fixed this segfault: #3818 (comment) |
c4e3224
to
77f3bea
Compare
Only thing left (besides testing, obviously) is timeout logic. Thoughts:
|
In general, I'd prefer to use the systemd timeouts over our own for systemd services. We have so, so many timers around all this stuff and offloading it would be easier to understand, I think. Plus, I think using the systemd timers might fix up some of your past comments about eating away at timers with setup/teardown tasks.
Introduced without comment in fccd046. My hunch is that it's just some slop time to guarantee that pacemaker's timeout is long enough for us to tell systemd what to do, systemd getting around to executing it, the actual timeout value to pass, and then systemd getting around to notifying us about what it did. |
The issue is that we still need to track the overall timeout. We do the So I was thinking we keep the Pacemaker timer until the start job is enqueued. To be most correct, we should enqueue it with the systemd job timeouts ( Hopefully none of that will ever matter. I don't like to rely on hope.
Yep, sorry for making you track that down. I've got that in a comment in a WIP commit that I haven't pushed yet. That hunch makes sense and is my best guess as well. I don't love having arbitrary slop time. |
Ref T25 Signed-off-by: Reid Wahl <[email protected]>
We will use this later for D-Bus signal message filtering. Ref T25 Signed-off-by: Reid Wahl <[email protected]>
We'll add a filter later. This does not meaningfully change behavior. Ref T25 Signed-off-by: Reid Wahl <[email protected]>
When systemd receives a StartUnit() or StopUnit() method call, it returns almost immediately, as soon as a start/stop job is enqueued. A successful return code does NOT indicate that the start/stop has finished. Previously, we worked around this in action_complete() with a hack that scheduled a follow-up monitor after a successful start/stop method call, which polled the service after 2 seconds to see whether it was actually running. However, this was not a robust solution. Timing issues could result in Pacemaker having an incorrect view of the resource's status or prematurely declaring the action as failed. Now, we follow the best practice as documented in the systemd D-Bus API doc (see StartUnit()): https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.systemd1.html#Methods After kicking off a systemd start/stop action, we make note of the job's D-Bus object path. Then we register a D-Bus message filter that looks for a JobRemoved signal whose bus path matches. This signal indicates that the job has completed and includes its result. When we find the matching signal, we set the action's result. We then remove the filter, which causes the action to be finalized and freed. In the case of the executor daemon, the action has a callback (action_complete()) that runs during finalization and sets the executor's view of the action result. Monitor actions still need much of the existing workaround code in action_complete(), so we keep it for now. We bail out for start/stop actions after setting the result as described above. Ref T25 Co-authored-by: Reid Wahl <[email protected]> Signed-off-by: Reid Wahl <[email protected]>
77f3bea
to
dc58e11
Compare
I figured out how we would do this, but I think I've changed my mind (before I got too far along). It's still true that there are several asynchronous steps of variable duration before the start is enqueued. These steps occur after Pacemaker initiates the action. However, when a user configures an action timeout, they're not thinking about overhead like this. They are (or should be) thinking about how long the start/stop itself takes. It feels nice when I can convince myself that the simpler approach is probably the better one. Let me know if you agree. |
Sigh. I just remembered that we don't support JUST systemd services, but also sockets, mounts, timers, and paths. To me it seems silly... I struggle to imagine a use case for managing these directly that couldn't be better achieved in some other way. But whatever. Someone wanted them. Added via #1508. These are probably already broken to some extent, because our override files assume we're working with a service. Other unit types don't support I think our best bet for a portable override is to set This still leaves the problem of creating an override file in the correct directory, whose syntax is valid for the unit type we're dealing with. The current one has a |
No description provided.