Allow extending baseline phase by samsrabin · Pull Request #4951 · ESMCI/cime

samsrabin · 2026-03-20T21:07:02Z

Description

Adds a new SystemTestsCommon method, generate_baseline_phase(), which the user can extend to perform extra operations during baseline generation. For example:

The user might have performed postprocessing during RUN that generated files to be included in the baseline.
The user might want to copy all files from a certain history tape, not just the last one.

I also changed it so that the GENERATE_BASELINE status isn't marked as succeeding or failing until after logs are copied (and now also after generate_baseline_phase()).

I have used this code in development and it seems to work. ~~However, I haven't begun to think about adding CIME tests.~~ Some CIME unit tests have now been added.

Checklist

My code follows the style guidelines of this project (black formatting)
I have performed a self-review of my own code
My changes generate no new warnings
I have added tests that exercise my feature/fix and existing tests continue to pass
I have commented my code, particularly in hard-to-understand areas
I have made corresponding additions and changes to the documentation

… logs.

…baseline_phase() method.

codecov · 2026-03-20T21:38:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.32%. Comparing base (82493b1) to head (3c0d623).
⚠️ Report is 139 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4951      +/-   ##
==========================================
- Coverage   31.44%   28.32%   -3.13%     
==========================================
  Files         264      262       -2     
  Lines       38703    38340     -363     
  Branches     8390     8114     -276     
==========================================
- Hits        12172    10860    -1312     
- Misses      25291    26234     +943     
- Partials     1240     1246       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

billsacks · 2026-03-22T17:28:25Z

This seems like a useful extension – thanks, @samsrabin ! I have a couple of questions:

(1) It kind of feels like, if a test extends the generate phase, it would also want to extend the compare phase similarly. Do you agree? If so, should that extension point also be added here?

(2) I'd be curious to see an example of how you imagine using this. No need to spend a lot of time putting something together if you don't have something yet, but if you have something already implemented or a specific use case already in mind, I'd be interested to see it. One reason I ask is because I think it could warrant some discussion as to which use cases are appropriate for this mechanism vs. possible other mechanisms.

Specifically, I'm thinking about:

The user might want to copy all files from a certain history tape, not just the last one.

Off-hand, this feels like something that might be less tied to a particular test type and more tied to a specific test, given the test length, testmods, etc. And it also feels like implementing it would be tricky enough that its implementation would probably belong in the core CIME (covered by unit tests) rather than in a specific test. So I'm wondering if this kind of behavior would be more appropriate as a standard option to CIME tests, triggered by a test modifier.

I do see how your first example:

The user might have performed postprocessing during RUN that generated files to be included in the baseline

would be tied to a specific test type - though would be interested in seeing an example just to better understand where you're going with this.

samsrabin · 2026-03-24T19:18:44Z

Thanks for having a look, @billsacks!

I developed this as part of testing the scripts for generating CTSM crop calendars. You may remember me introducing the RXCROPMATURITY test a while ago. At the time, we filed an issue (ESCOMP/CTSM#2166) saying that it'd be good to get this testing into aux_clm (and also clm_pymods, for that matter), but it's hard because the required tests are so long—a CTSM run has to go for a few years, then we have to run the scripts on the outputs, and then ideally we run CTSM again for a few years with the outputs from those scripts.

What I've done now is to make a new kind of test that just runs the scripts based on pre-existing CTSM outputs, then runs CTSM for a few days to make sure it doesn't crash when using the script outputs. (As you might imagine, this is a bit annoying to manage, but it massively improves iteration time when working on the crop calendar generation scripts. And the annoyance only comes when working on those scripts, not every time we run aux_clm or even ctsm_sci.)

The "based on pre-existing CTSM outputs" bit is why I only added an extension for the baseline generation phase and not the comparison phase. This probably would be better handled in core CIME, but I don't have the time to implement that anytime soon, and I need to get my RXCROPMATURITY work done before the CTSM6.0 release.

Ideally I would also like to include the script outputs in the comparison, but I consider that a separate, lower-priority effort that I'm not going to tackle before the CTSM6.0 release. These are already included indirectly, as inputs to the second CTSM run in the full RXCROPMATURITY test that happens at least every time we run ctsm_sci.

billsacks · 2026-03-24T23:13:30Z

Thanks for the explanation, @samsrabin . I still don't feel like I have a clear picture, but probably a discussion will be more efficient. Maybe you can explain this at the next CIME call?

samsrabin · 2026-03-25T16:14:02Z

Sure, we can definitely discuss at the next CIME meeting. In the meantime, here's a diagram showing the workflow for the two versions of the test. RXCROPMATURITY (blue) is the full run that takes a long time; RXCROPMATURITYSKIPFIRSTRUN (red) is the short one solely intended to check the Python scripts. The latter uses pre-existing outputs from the former as inputs to the Python scripts.

billsacks · 2026-03-25T16:36:53Z

Thanks, @samsrabin . The main piece I'm not understanding (which you can try to explain here if you'd like, or we can wait until we can discuss synchronously) is how you're going to leverage the baselines for this. I'm picturing something like:

You'll generate baselines from a run of the test suite where you have run RXCROPMATURITY
You'll then somehow point to these baselines in future runs of RXCROPMATURITYSKIPFIRSTRUN

If so, how would you ensure that the necessary baselines continue to exist for future runs of RXCROPMATURITYSKIPFIRSTRUN given that - at least historically - we have treated baseline directories as scrubbable things after some time? (Again, no need to answer here - just sharing the question in my mind.)

samsrabin added 2 commits February 13, 2026 11:34

SystemTestsCommon _generate_baseline(): Only set status after copying…

2df0153

… logs.

SystemTestsCommon: Optionally extend GENERATE_BASELINE with generate_…

8524cc1

…baseline_phase() method.

samsrabin requested a review from jasonb5 March 20, 2026 21:07

samsrabin self-assigned this Mar 20, 2026

samsrabin added the ty: enhancement label Mar 20, 2026

samsrabin added 2 commits March 20, 2026 15:24

Add unit tests for generate_baseline_phase().

ad58528

Reduce code duplication among test_generate_baseline* tests.

a4fd9df

Simplify testing of generate_baseline_phase().

3c0d623

samsrabin force-pushed the allow-extending-baseline-phase branch from 9b9e3d6 to 3c0d623 Compare March 20, 2026 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow extending baseline phase#4951

Allow extending baseline phase#4951
samsrabin wants to merge 5 commits intoESMCI:masterfrom
samsrabin:allow-extending-baseline-phase

samsrabin commented Mar 20, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

billsacks commented Mar 22, 2026

Uh oh!

samsrabin commented Mar 24, 2026

Uh oh!

billsacks commented Mar 24, 2026

Uh oh!

samsrabin commented Mar 25, 2026

Uh oh!

billsacks commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samsrabin commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

billsacks commented Mar 22, 2026

Uh oh!

samsrabin commented Mar 24, 2026

Uh oh!

billsacks commented Mar 24, 2026

Uh oh!

samsrabin commented Mar 25, 2026

Uh oh!

billsacks commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samsrabin commented Mar 20, 2026 •

edited

Loading

codecov bot commented Mar 20, 2026 •

edited

Loading