Low: libpacemaker: Don't assert on injecting nonexistent nodes. #3808

clumens · 2025-01-23T19:04:40Z

If crm_simulate tries to inject a node up or node fail action for a nonexistent node, it'll just assert. Instead, log an error message and return NULL.

That NULL then needs to be propagated up a couple layers to functions that actually expect and know how to handle errors.

Fixes T945

nrwahl2

My inclination is to fix this in an even simpler way. Currently, for --node-up, if the node doesn't exist then we simply inject it. We could do the same thing for --node-down (which also asserts) and --node-fail.

We haven't done that in the past, and there may be only little or no legitimate use case for it. But it seems at least as reasonable as ignoring the nonexistent node for --node-down and --node-up. And it seems unlikely that anyone is relying on not injecting the nonexistent node, because it's probably been asserting for a long time/forever.

To do the injection, all we have to do is set pcmk__simulate_node_config = true to node_down_cb() and node_fail_cb().

What do you think?

nrwahl2 · 2025-02-03T22:17:25Z

To do the injection, all we have to do is set pcmk__simulate_node_config = true to node_down_cb() and node_fail_cb().

What do you think?

Actually, injecting an action that refers to a nonexistent node would run into a similar issue. There may be other commands that do too. So, two options:

Do it the way you're doing it currently in this PR (returning NULL and letting the caller handle it as appropriate).
Get rid of pcmk__simulate_node_config and treat it as always true. Inject the node if it didn't already exist. Currently we assert in those situations, so no one should be relying on old behavior. I haven't yet looked deeply into whether this would have other unintended consequences. I've only glanced at this PR; still focused on the D-Bus stuff.

If crm_simulate tries to inject a node up or node fail action for a nonexistent node, it'll just assert. Instead, log an error message and return NULL. That NULL then needs to be propagated up a couple layers to functions that actually expect and know how to handle errors. Fixes T945

clumens · 2025-03-05T19:52:28Z

Rebased on main just to take a look at this again - no code changes, and not worth looking at again yet.

clumens · 2025-03-05T20:02:24Z

2. Get rid of `pcmk__simulate_node_config` and treat it as always true. Inject the node if it didn't already exist. Currently we assert in those situations, so no one should be relying on old behavior. I haven't yet looked deeply into whether this would have other unintended consequences. I've only glanced at this PR; still focused on the D-Bus stuff.

The way it feels like this should work to me is:

(1) If pcmk__simulate_node_config is true, inject unknown nodes.
(2) If it's false, error on unknown nodes.

I guess I don't much see the point of a flag that's only useful if it's set and crashes sometimes if it's not set. I am inclined to do what you've outlined here, though I can see the value of erroring out on unknown nodes as a way of catching typos on the command line.

clumens · 2025-03-05T20:25:59Z

Well, doing the obvious thing of getting rid of pcmk__simulate_node_config causes a lot of changes in scheduler regression output, specifically around bundle resources. For example:

--- a/cts/scheduler/summary/bundle-connection-with-container.summary
+++ b/cts/scheduler/summary/bundle-connection-with-container.summary
@@ -46,10 +46,11 @@ Using the original execution date of: 2022-07-13 22:13:26Z
 
 Revised Cluster Status:
   * Node List:
+    * Node httpd-bundle-0: pending
     * Online: [ rhel8-1 rhel8-3 rhel8-4 rhel8-5 ]
     * OFFLINE: [ rhel8-2 ]
     * RemoteOnline: [ remote-rhel8-2 ]
-    * GuestOnline: [ httpd-bundle-0 httpd-bundle-1 httpd-bundle-2 ]
+    * GuestOnline: [ httpd-bundle-1 httpd-bundle-2 ]
 
   * Full List of Resources:
     * Fencing  (stonith:fence_xvm):     Started rhel8-3

nrwahl2 · 2025-03-05T20:44:15Z

Well, doing the obvious thing of getting rid of pcmk__simulate_node_config causes a lot of changes in scheduler regression output, specifically around bundle resources. For example:

At a glance, it seems to be because we're now creating a node object for guest nodes when we shouldn't be. Looks like during unpack_nodes() instead of during unpack_remote_nodes(). Again, only glanced at trace output.

This still might be a decent path forward but we'd have to handle guest nodes differently somehow.

Obviously pcmk__inject_node() is the only function that changes if we set pcmk__simulate_node_config = true unconditionally, so start there and its callers.

nrwahl2 · 2025-06-19T05:29:14Z

@clumens What's the plan? Seemed like potential here

clumens force-pushed the crm_simulate-clean branch from 8fe1757 to 18f3048 Compare January 23, 2025 21:04

nrwahl2 reviewed Feb 3, 2025

View reviewed changes

clumens force-pushed the crm_simulate-clean branch from 18f3048 to 4f727f0 Compare March 5, 2025 19:51

clumens closed this Jun 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low: libpacemaker: Don't assert on injecting nonexistent nodes. #3808

Low: libpacemaker: Don't assert on injecting nonexistent nodes. #3808

Uh oh!

clumens commented Jan 23, 2025

Uh oh!

nrwahl2 left a comment •

edited

Loading

Uh oh!

nrwahl2 commented Feb 3, 2025 •

edited

Loading

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

nrwahl2 commented Mar 5, 2025 •

edited

Loading

Uh oh!

nrwahl2 commented Jun 19, 2025

Uh oh!

Uh oh!

Low: libpacemaker: Don't assert on injecting nonexistent nodes. #3808

Low: libpacemaker: Don't assert on injecting nonexistent nodes. #3808

Uh oh!

Conversation

clumens commented Jan 23, 2025

Uh oh!

nrwahl2 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrwahl2 commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

clumens commented Mar 5, 2025

Uh oh!

nrwahl2 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrwahl2 commented Jun 19, 2025

Uh oh!

Uh oh!

nrwahl2 left a comment •

edited

Loading

nrwahl2 commented Feb 3, 2025 •

edited

Loading

nrwahl2 commented Mar 5, 2025 •

edited

Loading