Med: daemons: Don't add repeated I_PE_CALC messages to the fsa queue. #3977

clumens · 2025-10-31T19:38:15Z

Let's say you have a two node cluster, node1 and node2. For purposes of testing, it's easiest if you use fence_dummy instead of a real fencing agent as this will fake fencing happening but without rebooting the node so you can see all the log files.

Assume the DC is node1. Now do the following on node2:

pcs node standby node1
pcs resource defaults update resource-stickiness=1
for i in $(seq 1 300); do echo $i; pcs resource create dummy$i ocf:heartbeat:Dummy --group dummy-group; done
pcs node unstandby node1

It will take a long time to create that many resources. After node1 comes out of standby, it'll take a minute or two but eventually you'll see that node1 was fenced. On node1, you'll see a lot of transition abort messages happen. Each of these transition aborts causes an I_PE_CALC message to be generated and added to the fsa queue. In my testing, I've seen the queue grow to ~ 600 messages, all of which are exactly the same thing.

These messages are fed into controld's glib event loop at G_PRIORITY_HIGH, while the presence of regular IPC messages trigger at G_PRIORITY_DEFAULT. Thus, the fsa messages take priority. It takes a while for controld to process all these high priority messages, during which time it is unable to read anything out of its IPC backlog.

based continues to attempt to send IPC events to controld but is unable to do so, so the backlog continues to grow. Eventually, the backlog reaches that 500 message threshold without anything having been read by controld, which triggers the eviction process.

There doesn't seem to be any reason for all these I_PE_CALC messages to be generated. They're all exactly the same, they don't appear to be tagged with any unique data tying them to a specific query, and their presence just slows everything down.

Thus, the fix here is very simple: if the latest message in the queue is an I_PE_CALC message, just don't add another one. We could also make sure there's only ever one I_PE_CALC message in the queue, but there could potentially be valid reasons for there to be multiple interleaved with other message types. I am erring on the side of caution with this minimal fix.

Related: RHEL-76276

clumens · 2025-10-31T19:40:14Z

@nrwahl2 This is a very short patch that we've talked about a bit already but it deserves a lot of thought. I have tested the RHEL-8 version of this patch (we use a GList instead of GQueue there) against RHEL-76276 with success. I've also run the regression tests and am starting ctslab now. After that, I want to run the same tests against the main branch but in the meantime we might as well review this. Please make sure the commit message makes sense so that in a couple years when we're wondering why I did this, we'll know.

Let's say you have a two node cluster, node1 and node2. For purposes of testing, it's easiest if you use fence_dummy instead of a real fencing agent as this will fake fencing happening but without rebooting the node so you can see all the log files. Assume the DC is node1. Now do the following on node2: - pcs node standby node1 - pcs resource defaults update resource-stickiness=1 - for i in $(seq 1 300); do echo $i; pcs resource create dummy$i ocf:heartbeat:Dummy --group dummy-group; done - pcs node unstandby node1 It will take a long time to create that many resources. After node1 comes out of standby, it'll take a minute or two but eventually you'll see that node1 was fenced. On node1, you'll see a lot of transition abort messages happen. Each of these transition aborts causes an I_PE_CALC message to be generated and added to the fsa queue. In my testing, I've seen the queue grow to ~ 600 messages, all of which are exactly the same thing. These messages are fed into controld's glib event loop at G_PRIORITY_HIGH, while the presence of regular IPC messages trigger at G_PRIORITY_DEFAULT. Thus, the fsa messages take priority. It takes a while for controld to process all these high priority messages, during which time it is unable to read anything out of its IPC backlog. based continues to attempt to send IPC events to controld but is unable to do so, so the backlog continues to grow. Eventually, the backlog reaches that 500 message threshold without anything having been read by controld, which triggers the eviction process. There doesn't seem to be any reason for all these I_PE_CALC messages to be generated. They're all exactly the same, they don't appear to be tagged with any unique data tying them to a specific query, and their presence just slows everything down. Thus, the fix here is very simple: if the latest message in the queue is an I_PE_CALC message, just don't add another one. We could also make sure there's only ever one I_PE_CALC message in the queue, but there could potentially be valid reasons for there to be multiple interleaved with other message types. I am erring on the side of caution with this minimal fix. Related: RHEL-76276

clumens requested a review from nrwahl2 October 31, 2025 19:38

clumens force-pushed the fewer-fsa-messages branch from 1834f07 to fc7f182 Compare November 1, 2025 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Med: daemons: Don't add repeated I_PE_CALC messages to the fsa queue. #3977

Med: daemons: Don't add repeated I_PE_CALC messages to the fsa queue. #3977

Uh oh!

clumens commented Oct 31, 2025

Uh oh!

clumens commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Med: daemons: Don't add repeated I_PE_CALC messages to the fsa queue. #3977

Are you sure you want to change the base?

Med: daemons: Don't add repeated I_PE_CALC messages to the fsa queue. #3977

Uh oh!

Conversation

clumens commented Oct 31, 2025

Uh oh!

clumens commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant