Med: cts: Add missing crm_simulate regression tests back to cts-cli. #3823

clumens · 2025-02-07T19:00:31Z

These were dropped during the conversion from shell to python.

clumens · 2025-02-07T19:10:10Z

Marked as a draft because these are going to fail, and there's no point wasting CI resources on that.

They fail like so:

clumens@spire ~/src/pacemaker crm_simulate-tests|⚑1✔❯ cts/cts-cli -r crm_simulate   
Using local binaries from: /home/clumens/src/pacemaker
Using local schemas from: /home/clumens/src/pacemaker/xml
* Running: crm_simulate(crm_simulate)      - Show allocation scores with crm_simulate
* Running: crm_simulate(crm_simulate)      - Show utilization with crm_simulate
* Running: crm_simulate(crm_simulate)      - Simulate injecting a failure
* Running: crm_simulate(crm_simulate)      - Simulate bringing a node down
* Running: crm_simulate(crm_simulate)      - Simulate a node failing
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid CIB (enum violation)
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid CIB (unrecognized validate-with)
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid, but possibly recoverable CIB (valid with X.Y+1)
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with valid CIB, but without validate-with attribute
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid CIB, also without validate-with attribute


Summary
* Passed: crm_simulate          - Show allocation scores with crm_simulate
* Passed: crm_simulate          - Show utilization with crm_simulate
* Passed: crm_simulate          - Simulate injecting a failure
* Passed: crm_simulate          - Simulate bringing a node down
* Passed: crm_simulate          - Simulate a node failing
* Passed: crm_simulate          - Run crm_simulate with invalid CIB (enum violation)
* Failed (rc=001): crm_simulate            - Run crm_simulate with invalid CIB (unrecognized validate-with)
* Passed: crm_simulate          - Run crm_simulate with invalid, but possibly recoverable CIB (valid with X.Y+1)
* Failed (rc=001): crm_simulate            - Run crm_simulate with valid CIB, but without validate-with attribute
* Failed (rc=001): crm_simulate            - Run crm_simulate with invalid CIB, also without validate-with attribute
3 tests failed

And the reason for each of these failures is that in setup_input in crm_simulate.c, one of the first things we do is call pcmk__update_configured_schema which in turn calls get_configured_schema. For the three failing tests, that function fails, which returns pcmk_rc_cib_corrupt but doesn't do any logging.

Back in setup_input we see the return value but don't do any logging nor do anything with the GError variable. So the pcmk_rc_cib_corrupt gets returned to the caller, which is main. main sees the error code and goes to clean up. Because we didn't do anything with the GError, pcmk__output_and_clear_error doesn't have anything to print. And then when we get down to pcmk_rc2exitc, it converts pcmk_rc_cib_corrupt to a generic error because it doesn't have a block for handling that value.

So, there's several things going on here:

pcmk_rc2exitc really should handle pcmk_rc_cib_corrupt - I think it should return CRM_EX_CONFIG, but I'm hesitant to do that because who knows what other programs might be relying on the generic error value in this case.
If we don't do the above, main could at least do the conversion itself so a more appropriate exit code could be returned. This would be enough to make the first failing test pass, because it would get the expected exit code.
The other two failing tests used to pass, but I think they should be expected to fail now - as I recall, it's no longer valid to have a missing validate-with= attribute.
We really should have better error reporting in this case. Getting the right exit value still isn't going to help the user debug the actual problem very well.

nrwahl2 · 2025-02-08T23:01:04Z

pcmk_rc2exitc really should handle pcmk_rc_cib_corrupt - I think it should return CRM_EX_CONFIG, but I'm hesitant to do that because who knows what other programs might be relying on the generic error value in this case.

There is precedent for changing the mappings outside of a minor release:

Probably more -- I stopped there. I feel like if even Ken approved these, as cautious as he was about changing user-facing behavior, then it's probably fine. Just tag the commit with "API:" and ensure it goes in the change log.

I don't think remapping the return code in crm_simulate.c is any better. It limits the scope of the mapping change, but it still changes user-facing behavior (different error code in this situation).

Alternatively, if we wanted to be extra cautious, we could COMPAT notes to results.c and cts-cli.in, noting that we should do the remapping at a compatibility break; and use CRM_EX_ERROR for now. I don't think that's necessary, but it is an option.

The other two failing tests used to pass, but I think they should be expected to fail now - as I recall, it's no longer valid to have a missing validate-with= attribute.

Correct: see 2c4737b. The schema transform to 4.0 adds validate-with="pacemaker-4.0", but we can't start a transformation pipeline if we don't know the original validate-with.

These were dropped during the conversion from shell to python.

If there are differences from expected output when running under CI, it's useful to be able to see them instead of having to track down the files.

This helps to capture the output from the tests if problems are encountered when running under CI.

nrwahl2 · 2025-02-10T19:29:13Z

I feel like if even Ken approved these, as cautious as he was about changing user-facing behavior, then it's probably fine. Just tag the commit with "API:" and ensure it goes in the change log.

I don't think remapping the return code in crm_simulate.c is any better. It limits the scope of the mapping change, but it still changes user-facing behavior (different error code in this situation).

Now that I think about it, that is interesting though. Because on the flip side, we've often been very cautious about changing exit codes of CLI tools. For example, we want crm_diff to exit with CRM_EX_DIGEST on a mismatch, but we kept generic error for backward compatibility.

So another option would be:

change the mapping in pcmk_rc2exitc()
in crm_simulate.c, override the mapping, mapping it to generic error instead; add a COMPAT note to remove the override at a compatibility break
in cts-cli.in, expect generic error; add a COMPAT note that this will change at a compatibility break

clumens · 2025-02-11T15:49:42Z

So another option would be:

* change the mapping in `pcmk_rc2exitc()`

* in `crm_simulate.c`, override the mapping, mapping it to generic error instead; add a COMPAT note to remove the override at a compatibility break

* in `cts-cli.in`, expect generic error; add a COMPAT note that this will change at a compatibility break

Yeah, I could do that if you think it's worth it. I am a little less worried about changing exit codes when it's really a bug fix which I'd consider this to be. Looking through the history of the results code, we've had some sort of cib_corrupt value for a very long time, but our initial crm_errno2exit function didn't map it or most of the pcmk_err_ values. I'd consider that to be the initial bug. When Ken introduced the new results.h stuff, he copied that function over, preserving the fact that most of the values didn't get mapped.

Over time we've introduced additional codes and improved the mapping, but for whatever reason, this value in particular has never been added. There may be others. I haven't looked because they didn't affect what I was trying to do in this PR.

nrwahl2 · 2025-02-13T06:46:48Z

Yeah, I could do that if you think it's worth it. I am a little less worried about changing exit codes when it's really a bug fix which I'd consider this to be. Looking through the history of the results code, we've had some sort of cib_corrupt value for a very long time, but our initial crm_errno2exit function didn't map it or most of the pcmk_err_ values. I'd consider that to be the initial bug. When Ken introduced the new results.h stuff, he copied that function over, preserving the fact that most of the values didn't get mapped.

Over time we've introduced additional codes and improved the mapping, but for whatever reason, this value in particular has never been added. There may be others. I haven't looked because they didn't affect what I was trying to do in this PR.

I'm not terribly concerned about backward compatibility of the particular error codes that a CLI tool returns. It's just something that we've been sticklers about in the past, right or wrong. (And we are often surprised by finding things that depend on particular user-facing behavior where it doesn't even make sense to do so lol.) If we want to continue doing that, it's not too hard to add COMPAT notes, etc. Just mildly annoying.

I'm not bothered either way.

I agree it seems like a bug fix. So would returning CRM_EX_DIGEST on mismatch in crm_diff though. Shrug.

clumens added the status: in progress PRs that aren't ready yet label Feb 7, 2025

API: libcrmcommon: Map pcmk_rc_cib_corrupt to CRM_EX_CONFIG.

6a55131

clumens force-pushed the crm_simulate-tests branch from 6a925de to 2a6b2d1 Compare February 10, 2025 14:41

clumens removed the status: in progress PRs that aren't ready yet label Feb 10, 2025

clumens marked this pull request as ready for review February 10, 2025 14:42

Med: cts: Add missing crm_simulate regression tests back to cts-cli.

6731ec9

These were dropped during the conversion from shell to python.

clumens force-pushed the crm_simulate-tests branch from 2a6b2d1 to 6731ec9 Compare February 10, 2025 17:02

clumens added 2 commits February 10, 2025 13:14

Test: CTS: Pass verbose argument to the diff function.

c01dc90

If there are differences from expected output when running under CI, it's useful to be able to see them instead of having to track down the files.

Build: Run cts-cli verbosely.

33c36ae

This helps to capture the output from the tests if problems are encountered when running under CI.

clumens added the review: in progress PRs that are currently being reviewed label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Med: cts: Add missing crm_simulate regression tests back to cts-cli. #3823

Med: cts: Add missing crm_simulate regression tests back to cts-cli. #3823

clumens commented Feb 7, 2025

clumens commented Feb 7, 2025

nrwahl2 commented Feb 8, 2025 •

edited

Loading

nrwahl2 commented Feb 10, 2025 •

edited

Loading

clumens commented Feb 11, 2025

nrwahl2 commented Feb 13, 2025 •

edited

Loading

Med: cts: Add missing crm_simulate regression tests back to cts-cli. #3823

Are you sure you want to change the base?

Med: cts: Add missing crm_simulate regression tests back to cts-cli. #3823

Conversation

clumens commented Feb 7, 2025

clumens commented Feb 7, 2025

nrwahl2 commented Feb 8, 2025 • edited Loading

nrwahl2 commented Feb 10, 2025 • edited Loading

clumens commented Feb 11, 2025

nrwahl2 commented Feb 13, 2025 • edited Loading

nrwahl2 commented Feb 8, 2025 •

edited

Loading

nrwahl2 commented Feb 10, 2025 •

edited

Loading

nrwahl2 commented Feb 13, 2025 •

edited

Loading