Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Med: cts: Add missing crm_simulate regression tests back to cts-cli. #3823

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

clumens
Copy link
Contributor

@clumens clumens commented Feb 7, 2025

These were dropped during the conversion from shell to python.

@clumens clumens added the status: in progress PRs that aren't ready yet label Feb 7, 2025
@clumens
Copy link
Contributor Author

clumens commented Feb 7, 2025

Marked as a draft because these are going to fail, and there's no point wasting CI resources on that.

They fail like so:

clumens@spire ~/src/pacemaker crm_simulate-tests|⚑1✔❯ cts/cts-cli -r crm_simulate   
Using local binaries from: /home/clumens/src/pacemaker
Using local schemas from: /home/clumens/src/pacemaker/xml
* Running: crm_simulate(crm_simulate)      - Show allocation scores with crm_simulate
* Running: crm_simulate(crm_simulate)      - Show utilization with crm_simulate
* Running: crm_simulate(crm_simulate)      - Simulate injecting a failure
* Running: crm_simulate(crm_simulate)      - Simulate bringing a node down
* Running: crm_simulate(crm_simulate)      - Simulate a node failing
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid CIB (enum violation)
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid CIB (unrecognized validate-with)
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid, but possibly recoverable CIB (valid with X.Y+1)
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with valid CIB, but without validate-with attribute
* Running: crm_simulate(crm_simulate)      - Run crm_simulate with invalid CIB, also without validate-with attribute


Summary
* Passed: crm_simulate          - Show allocation scores with crm_simulate
* Passed: crm_simulate          - Show utilization with crm_simulate
* Passed: crm_simulate          - Simulate injecting a failure
* Passed: crm_simulate          - Simulate bringing a node down
* Passed: crm_simulate          - Simulate a node failing
* Passed: crm_simulate          - Run crm_simulate with invalid CIB (enum violation)
* Failed (rc=001): crm_simulate            - Run crm_simulate with invalid CIB (unrecognized validate-with)
* Passed: crm_simulate          - Run crm_simulate with invalid, but possibly recoverable CIB (valid with X.Y+1)
* Failed (rc=001): crm_simulate            - Run crm_simulate with valid CIB, but without validate-with attribute
* Failed (rc=001): crm_simulate            - Run crm_simulate with invalid CIB, also without validate-with attribute
3 tests failed

And the reason for each of these failures is that in setup_input in crm_simulate.c, one of the first things we do is call pcmk__update_configured_schema which in turn calls get_configured_schema. For the three failing tests, that function fails, which returns pcmk_rc_cib_corrupt but doesn't do any logging.

Back in setup_input we see the return value but don't do any logging nor do anything with the GError variable. So the pcmk_rc_cib_corrupt gets returned to the caller, which is main. main sees the error code and goes to clean up. Because we didn't do anything with the GError, pcmk__output_and_clear_error doesn't have anything to print. And then when we get down to pcmk_rc2exitc, it converts pcmk_rc_cib_corrupt to a generic error because it doesn't have a block for handling that value.

So, there's several things going on here:

  • pcmk_rc2exitc really should handle pcmk_rc_cib_corrupt - I think it should return CRM_EX_CONFIG, but I'm hesitant to do that because who knows what other programs might be relying on the generic error value in this case.
  • If we don't do the above, main could at least do the conversion itself so a more appropriate exit code could be returned. This would be enough to make the first failing test pass, because it would get the expected exit code.
  • The other two failing tests used to pass, but I think they should be expected to fail now - as I recall, it's no longer valid to have a missing validate-with= attribute.
  • We really should have better error reporting in this case. Getting the right exit value still isn't going to help the user debug the actual problem very well.

@nrwahl2
Copy link
Contributor

nrwahl2 commented Feb 8, 2025

pcmk_rc2exitc really should handle pcmk_rc_cib_corrupt - I think it should return CRM_EX_CONFIG, but I'm hesitant to do that because who knows what other programs might be relying on the generic error value in this case.

There is precedent for changing the mappings outside of a minor release:

Probably more -- I stopped there. I feel like if even Ken approved these, as cautious as he was about changing user-facing behavior, then it's probably fine. Just tag the commit with "API:" and ensure it goes in the change log.

I don't think remapping the return code in crm_simulate.c is any better. It limits the scope of the mapping change, but it still changes user-facing behavior (different error code in this situation).

Alternatively, if we wanted to be extra cautious, we could COMPAT notes to results.c and cts-cli.in, noting that we should do the remapping at a compatibility break; and use CRM_EX_ERROR for now. I don't think that's necessary, but it is an option.

The other two failing tests used to pass, but I think they should be expected to fail now - as I recall, it's no longer valid to have a missing validate-with= attribute.

Correct: see 2c4737b. The schema transform to 4.0 adds validate-with="pacemaker-4.0", but we can't start a transformation pipeline if we don't know the original validate-with.

@clumens clumens removed the status: in progress PRs that aren't ready yet label Feb 10, 2025
@clumens clumens marked this pull request as ready for review February 10, 2025 14:42
These were dropped during the conversion from shell to python.
If there are differences from expected output when running under CI,
it's useful to be able to see them instead of having to track down the
files.
This helps to capture the output from the tests if problems are
encountered when running under CI.
@nrwahl2
Copy link
Contributor

nrwahl2 commented Feb 10, 2025

I feel like if even Ken approved these, as cautious as he was about changing user-facing behavior, then it's probably fine. Just tag the commit with "API:" and ensure it goes in the change log.

I don't think remapping the return code in crm_simulate.c is any better. It limits the scope of the mapping change, but it still changes user-facing behavior (different error code in this situation).

Now that I think about it, that is interesting though. Because on the flip side, we've often been very cautious about changing exit codes of CLI tools. For example, we want crm_diff to exit with CRM_EX_DIGEST on a mismatch, but we kept generic error for backward compatibility.

So another option would be:

  • change the mapping in pcmk_rc2exitc()
  • in crm_simulate.c, override the mapping, mapping it to generic error instead; add a COMPAT note to remove the override at a compatibility break
  • in cts-cli.in, expect generic error; add a COMPAT note that this will change at a compatibility break

@clumens
Copy link
Contributor Author

clumens commented Feb 11, 2025

So another option would be:

* change the mapping in `pcmk_rc2exitc()`

* in `crm_simulate.c`, override the mapping, mapping it to generic error instead; add a COMPAT note to remove the override at a compatibility break

* in `cts-cli.in`, expect generic error; add a COMPAT note that this will change at a compatibility break

Yeah, I could do that if you think it's worth it. I am a little less worried about changing exit codes when it's really a bug fix which I'd consider this to be. Looking through the history of the results code, we've had some sort of cib_corrupt value for a very long time, but our initial crm_errno2exit function didn't map it or most of the pcmk_err_ values. I'd consider that to be the initial bug. When Ken introduced the new results.h stuff, he copied that function over, preserving the fact that most of the values didn't get mapped.

Over time we've introduced additional codes and improved the mapping, but for whatever reason, this value in particular has never been added. There may be others. I haven't looked because they didn't affect what I was trying to do in this PR.

@clumens clumens added the review: in progress PRs that are currently being reviewed label Feb 11, 2025
@nrwahl2
Copy link
Contributor

nrwahl2 commented Feb 13, 2025

Yeah, I could do that if you think it's worth it. I am a little less worried about changing exit codes when it's really a bug fix which I'd consider this to be. Looking through the history of the results code, we've had some sort of cib_corrupt value for a very long time, but our initial crm_errno2exit function didn't map it or most of the pcmk_err_ values. I'd consider that to be the initial bug. When Ken introduced the new results.h stuff, he copied that function over, preserving the fact that most of the values didn't get mapped.

Over time we've introduced additional codes and improved the mapping, but for whatever reason, this value in particular has never been added. There may be others. I haven't looked because they didn't affect what I was trying to do in this PR.

I'm not terribly concerned about backward compatibility of the particular error codes that a CLI tool returns. It's just something that we've been sticklers about in the past, right or wrong. (And we are often surprised by finding things that depend on particular user-facing behavior where it doesn't even make sense to do so lol.) If we want to continue doing that, it's not too hard to add COMPAT notes, etc. Just mildly annoying.

I'm not bothered either way.

I agree it seems like a bug fix. So would returning CRM_EX_DIGEST on mismatch in crm_diff though. Shrug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review: in progress PRs that are currently being reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants