Skip to content

Allow extending baseline phase#4951

Open
samsrabin wants to merge 5 commits intoESMCI:masterfrom
samsrabin:allow-extending-baseline-phase
Open

Allow extending baseline phase#4951
samsrabin wants to merge 5 commits intoESMCI:masterfrom
samsrabin:allow-extending-baseline-phase

Conversation

@samsrabin
Copy link
Copy Markdown
Contributor

@samsrabin samsrabin commented Mar 20, 2026

Description

Adds a new SystemTestsCommon method, generate_baseline_phase(), which the user can extend to perform extra operations during baseline generation. For example:

  1. The user might have performed postprocessing during RUN that generated files to be included in the baseline.
  2. The user might want to copy all files from a certain history tape, not just the last one.

I also changed it so that the GENERATE_BASELINE status isn't marked as succeeding or failing until after logs are copied (and now also after generate_baseline_phase()).

I have used this code in development and it seems to work. However, I haven't begun to think about adding CIME tests. Some CIME unit tests have now been added.

Checklist

  • My code follows the style guidelines of this project (black formatting)
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • I have added tests that exercise my feature/fix and existing tests continue to pass
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding additions and changes to the documentation

@samsrabin samsrabin requested a review from jasonb5 March 20, 2026 21:07
@samsrabin samsrabin self-assigned this Mar 20, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.32%. Comparing base (82493b1) to head (3c0d623).
⚠️ Report is 139 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4951      +/-   ##
==========================================
- Coverage   31.44%   28.32%   -3.13%     
==========================================
  Files         264      262       -2     
  Lines       38703    38340     -363     
  Branches     8390     8114     -276     
==========================================
- Hits        12172    10860    -1312     
- Misses      25291    26234     +943     
- Partials     1240     1246       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@samsrabin samsrabin force-pushed the allow-extending-baseline-phase branch from 9b9e3d6 to 3c0d623 Compare March 20, 2026 21:42
@billsacks
Copy link
Copy Markdown
Member

This seems like a useful extension – thanks, @samsrabin ! I have a couple of questions:

(1) It kind of feels like, if a test extends the generate phase, it would also want to extend the compare phase similarly. Do you agree? If so, should that extension point also be added here?

(2) I'd be curious to see an example of how you imagine using this. No need to spend a lot of time putting something together if you don't have something yet, but if you have something already implemented or a specific use case already in mind, I'd be interested to see it. One reason I ask is because I think it could warrant some discussion as to which use cases are appropriate for this mechanism vs. possible other mechanisms.

Specifically, I'm thinking about:

The user might want to copy all files from a certain history tape, not just the last one.

Off-hand, this feels like something that might be less tied to a particular test type and more tied to a specific test, given the test length, testmods, etc. And it also feels like implementing it would be tricky enough that its implementation would probably belong in the core CIME (covered by unit tests) rather than in a specific test. So I'm wondering if this kind of behavior would be more appropriate as a standard option to CIME tests, triggered by a test modifier.

I do see how your first example:

The user might have performed postprocessing during RUN that generated files to be included in the baseline

would be tied to a specific test type - though would be interested in seeing an example just to better understand where you're going with this.

@samsrabin
Copy link
Copy Markdown
Contributor Author

Thanks for having a look, @billsacks!

I developed this as part of testing the scripts for generating CTSM crop calendars. You may remember me introducing the RXCROPMATURITY test a while ago. At the time, we filed an issue (ESCOMP/CTSM#2166) saying that it'd be good to get this testing into aux_clm (and also clm_pymods, for that matter), but it's hard because the required tests are so long—a CTSM run has to go for a few years, then we have to run the scripts on the outputs, and then ideally we run CTSM again for a few years with the outputs from those scripts.

What I've done now is to make a new kind of test that just runs the scripts based on pre-existing CTSM outputs, then runs CTSM for a few days to make sure it doesn't crash when using the script outputs. (As you might imagine, this is a bit annoying to manage, but it massively improves iteration time when working on the crop calendar generation scripts. And the annoyance only comes when working on those scripts, not every time we run aux_clm or even ctsm_sci.)

The "based on pre-existing CTSM outputs" bit is why I only added an extension for the baseline generation phase and not the comparison phase. This probably would be better handled in core CIME, but I don't have the time to implement that anytime soon, and I need to get my RXCROPMATURITY work done before the CTSM6.0 release.

Ideally I would also like to include the script outputs in the comparison, but I consider that a separate, lower-priority effort that I'm not going to tackle before the CTSM6.0 release. These are already included indirectly, as inputs to the second CTSM run in the full RXCROPMATURITY test that happens at least every time we run ctsm_sci.

@billsacks
Copy link
Copy Markdown
Member

Thanks for the explanation, @samsrabin . I still don't feel like I have a clear picture, but probably a discussion will be more efficient. Maybe you can explain this at the next CIME call?

@samsrabin
Copy link
Copy Markdown
Contributor Author

Sure, we can definitely discuss at the next CIME meeting. In the meantime, here's a diagram showing the workflow for the two versions of the test. RXCROPMATURITY (blue) is the full run that takes a long time; RXCROPMATURITYSKIPFIRSTRUN (red) is the short one solely intended to check the Python scripts. The latter uses pre-existing outputs from the former as inputs to the Python scripts.

Diagram of RXCROPMATURITY* runs

@billsacks
Copy link
Copy Markdown
Member

Thanks, @samsrabin . The main piece I'm not understanding (which you can try to explain here if you'd like, or we can wait until we can discuss synchronously) is how you're going to leverage the baselines for this. I'm picturing something like:

  • You'll generate baselines from a run of the test suite where you have run RXCROPMATURITY
  • You'll then somehow point to these baselines in future runs of RXCROPMATURITYSKIPFIRSTRUN

If so, how would you ensure that the necessary baselines continue to exist for future runs of RXCROPMATURITYSKIPFIRSTRUN given that - at least historically - we have treated baseline directories as scrubbable things after some time? (Again, no need to answer here - just sharing the question in my mind.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants