Add CI tests on NCAR's CIRRUS cloud service#470
Conversation
0c08a36 to
34148c1
Compare
0c44f4c to
75aec90
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds continuous integration testing for RRTMGP GPU functionality on NCAR's CIRRUS cloud service. The changes enable automated testing of GPU-accelerated radiation schemes in CAM-SIMA using CIRRUS infrastructure.
Changes:
- Added GitHub Actions workflow to build and run GPU tests on CIRRUS
- Created test configuration for RRTMGP GPU tests with fixed orbital parameters and diagnostic output settings
- Updated ccs_config submodule to version 1.0.76 for CIRRUS GPU_TYPE support
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/build_and_run_cirrus.yml | Implements CI workflow for running SMS tests with RRTMGP on CIRRUS using NVIDIA GPU containers |
| cime_config/testdefs/testmods_dirs/cam/outfrq_rrtmgp_cirrus_gpu/user_nl_cpl | Configures fixed orbital parameters for reproducible test conditions |
| cime_config/testdefs/testmods_dirs/cam/outfrq_rrtmgp_cirrus_gpu/user_nl_cam | Defines test snapshots, tolerances, radiation settings, and diagnostic output fields |
| cime_config/testdefs/testmods_dirs/cam/outfrq_rrtmgp_cirrus_gpu/shell_commands | Sets GPU-specific test parameters including GPU type, task counts, and RRTMGP physics suite |
| ccs_config | Updates submodule reference to version supporting CIRRUS GPU configurations |
| .gitmodules | Updates ccs_config tag from 1.0.72 to 1.0.76 |
Comments suppressed due to low confidence (4)
.gitmodules:35
- Trailing whitespace detected at the end of lines 34 and 35. Remove the extra spaces after the URLs and tags to maintain consistent formatting.
url = https://github.com/ESMCI/ccs_config_cesm.git
fxtag = ccs_config_cesm1.0.76
cime_config/testdefs/testmods_dirs/cam/outfrq_rrtmgp_cirrus_gpu/user_nl_cam:16
- The configuration uses a semicolon delimiter which may be non-standard syntax. Verify this is the correct syntax for hist_output_frequency rather than a colon or equals sign.
hist_output_frequency;h1: 1*nsteps
.github/workflows/build_and_run_cirrus.yml:69
- The variable $TMP_OUTPUT is used but refers to a relative path 'case_output.log' defined in env, while $TMP_DIR uses an absolute path '/$TMP_DIR/ci_test'. This inconsistency means the log file will be written to the working directory (cime/scripts) rather than alongside the test output. Consider using an absolute path like '/$TMP_DIR/$TMP_OUTPUT' or clarify the intended location.
./create_test ${{ matrix.test_type }}.${{ matrix.res }}.${{ matrix.compset }}.cirrus_${{ matrix.compiler }}.cam-${{ matrix.test_config }} --output-root /$TMP_DIR/ci_test --no-batch 2>&1 | tee "$TMP_OUTPUT"
.github/workflows/build_and_run_cirrus.yml:45
- Using fallback credentials ('dummy-user' and 'dummy-password') when secrets are not available could expose a security risk or cause silent failures. Consider failing explicitly when required secrets are missing rather than using dummy credentials.
username: ${{ secrets.hub_user || 'dummy-user' }}
password: ${{ secrets.hub_password || 'dummy-password' }}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
peverwhee
left a comment
There was a problem hiding this comment.
thanks @sjsprecious !
|
@nusbaume @kuanchihwang do you have any comments or concerns on this PR? |
nusbaume
left a comment
There was a problem hiding this comment.
Sorry for the delay @sjsprecious! Everything looks good to me. Thanks again for your help in setting up this new testing workflow!
|
Hi @sjsprecious, just an FYI that it looks like the |
Hi @nusbaume , thanks for your update. It sounds reasonable to wait for @kuanchihwang 's final approval and the new baseline. Feel free to merge it whenever you think is appropriate. |
kuanchihwang
left a comment
There was a problem hiding this comment.
Thanks @sjsprecious for the first containerized CI workflow in CAM-SIMA! I have a few change requests for your consideration.
cime_config/testdefs/testmods_dirs/cam/outfrq_rrtmgp_cirrus_gpu/shell_commands
Outdated
Show resolved
Hide resolved
kuanchihwang
left a comment
There was a problem hiding this comment.
Thanks again @sjsprecious!
|
Hi @sjsprecious, just double-checking if there is anything more you would like to add to this PR? If not then I'll go ahead and start the steps to officially get this PR merged in. Thanks! |
|
Hi @nusbaume , I think I have addressed all the reviewers' comments and the PR should be ready for merge. Thanks for your help. |
Tag name (required for release branches): sima0_13_000
Originator(s): sjsprecious
Description (include the issue title, and the keyword ['closes', 'fixes', 'resolves'] followed by the issue number):
Add a CI workflow to run the RRTMGP GPU tests on CIRRUS cloud service.
Describe any changes made to build system: N/A
Describe any changes made to the namelist: N/A
List any changes to the defaults for the input datasets (e.g. boundary datasets): N/A
List all files eliminated and why: N/A
List all files added and what they do:
List all existing files that have been modified, and describe the changes:
(Helpful git command:
git diff --name-status development...<your_branch_name>)If there are new failures (compared to the
test/existing-test-failures.txtfile),have them OK'd by the gatekeeper, note them here, and add them to the file.
If there are baseline differences, include the test and the reason for the
diff. What is the nature of the change? Roundoff?
derecho/intel/aux_sima:
SMS_Ln9.mpasa120_mpasa120.QPC4.derecho_intel.cam-outfrq_analy_ic_cam4 (Overall: DIFF)
SMS_Ln9.mpasa480_mpasa480.FKESSLER.derecho_intel.cam-outfrq_kessler_mpas_derecho (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FADIAB.derecho_intel.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FADIAB.derecho_intel.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FCAM7.derecho_intel.cam-outfrq_se_cslam_analy_ic (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FHS94.derecho_intel.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FKESSLER.derecho_intel.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FTJ16.derecho_intel.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FKESSLER.derecho_intel.cam-outfrq_se_cslam_multitape (Overall: DIFF) details:
derecho/gnu/aux_sima:
SMS_Ln9.ne3pg3_ne3pg3_mg37.FCAM7.derecho_gnu.cam-outfrq_se_cslam_analy_ic (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FHS94.derecho_gnu.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FKESSLER.derecho_gnu.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FTJ16.derecho_gnu.cam-outfrq_se_cslam (Overall: DIFF)
SMS_Ln9.ne3pg3_ne3pg3_mg37.FADIAB.derecho_gnu.cam-outfrq_se_cslam (Overall: FAIL)
derecho/nvhpc/aux_sima: ALL PASS
If this changes climate describe any run(s) done to evaluate the new
climate in enough detail that it(they) could be reproduced: N/A
CAM-SIMA date used for the baseline comparison tests if different than latest: