Skip to content

Commit

Permalink
Fix: libcrmservices: Add another place to finalize Start/Stop ops
Browse files Browse the repository at this point in the history
The comment should explain this pretty well. What was happening was that
an executor regression test was failing because the operation never gets
finalized. Essentially, services__finalize_async_op() never gets called
for a unit that doesn't exist.

We typically want this to happen as part of the systemd callback that's
registered in the executor, but this callback will never get called in
this case because there's no systemd message that will make it through
our filter and trigger the callback. JobRemoved won't get sent because
there's no job to remove.

It seems like we could look for a failure from LoadUnit for units that
don't exist, but pacemaker always writes an override unit file which
means the unit will always exist.
  • Loading branch information
clumens authored and nrwahl2 committed Feb 6, 2025
1 parent 12deab0 commit 002647f
Showing 1 changed file with 15 additions and 6 deletions.
21 changes: 15 additions & 6 deletions lib/services/systemd.c
Original file line number Diff line number Diff line change
Expand Up @@ -918,19 +918,28 @@ unit_method_complete(DBusPendingCall *pending, void *user_data)
CRM_LOG_ASSERT(pending == op->opaque->pending);
services_set_op_pending(op, NULL);

/* Determine result and finalize action. Start actions must be finalized
* by the registered systemd callback (for instance, handle_systemd_job_complete
* in the executor) to allow for some place to react to the finished
* action before finalizing it and cleaning it up.
/* Determine result and finalize action.
*
* Start and stop actions must be finalized by the registered systemd
* callback (for instance, handle_systemd_job_complete() in the executor) to
* allow for some place to react to the finished action before finalizing it
* and cleaning it up.
*
* HOWEVER, if the operation failed with NOT_INSTALLED, we need to finalize
* it right now too. We'll never receive a JobRemoved() signal for that
* case, so the callback will never be called. Additionally since we set up
* an override file, LoadUnit will always succeed so we can't finalize based
* on that failing. We have to do this here.
*/
process_unit_method_reply(reply, op);

if (reply != NULL) {
dbus_message_unref(reply);
}

if (pcmk__strcase_any_of(op->action, PCMK_ACTION_START, PCMK_ACTION_STOP,
NULL)) {
if ((op->status != PCMK_EXEC_NOT_INSTALLED)
&& pcmk__strcase_any_of(op->action, PCMK_ACTION_START, PCMK_ACTION_STOP,
NULL)) {
// The filter takes ownership of op and will finalize it on removal
if (dbus_connection_add_filter(systemd_proxy, filter_job_removed_signal,
op, finalize_async_op_dbus)) {
Expand Down

0 comments on commit 002647f

Please sign in to comment.