Skip to content

Add slurm/pbs job submission capability to lake-landice remapping#179

Merged
sdrabenh merged 7 commits intofeature/sdrabenh/gcm_v12from
feature/v12-slurm-lakelandice
Mar 24, 2026
Merged

Add slurm/pbs job submission capability to lake-landice remapping#179
sdrabenh merged 7 commits intofeature/sdrabenh/gcm_v12from
feature/v12-slurm-lakelandice

Conversation

@mathomp4
Copy link
Member

@mathomp4 mathomp4 commented Feb 23, 2026

This brings lake-landice remapping in line with upper_air and catch remapping, allowing it to run in batch queues on both NAS and NCCS.

The weird thing is, @sshakoor1 did this at some point but it got lost? Not sure. But this is Claude trying to bring it back

  • Add job configuration with NPE=1 (serial execution with MPI for MAPL)
  • Set walltime based on tile resolution: 30min for <=C360, 1hr for higher res
  • Create job script template with SLURM/PBS directives matching remap_upper.py
  • Add interactive vs batch mode detection (PBS_JOBID/SLURM_JOB_ID)
  • Refactor command execution to use esma_mpirun wrapper
  • Submit jobs via qsub (NAS) or sbatch (NCCS) when not in interactive session
  • Remove obsolete run_and_log method
  • Maintain backward compatibility: interactive mode behaves as before

ALSO: I've fixed the remapping jobs at NCCS to Milan only. it was found that Cascade Lake gives different answers to the GOCART restart...for some reason.

This needs to be fully tested and looked over. Probably by me and @weiyuan-jiang and @biljanaorescanin

I'm keeping draft until tested.


ETA: Both @biljanaorescanin and @weiyuan-jiang have tested and we've updated the baselines.

- Add job configuration with NPE=1 (serial execution with MPI for MAPL)
- Set walltime based on tile resolution: 30min for <=C360, 1hr for higher res
- Create job script template with SLURM/PBS directives matching remap_upper.py
- Add interactive vs batch mode detection (PBS_JOBID/SLURM_JOB_ID)
- Refactor command execution to use esma_mpirun wrapper
- Submit jobs via qsub (NAS) or sbatch (NCCS) when not in interactive session
- Remove obsolete run_and_log method
- Maintain backward compatibility: interactive mode behaves as before

This brings lake-landice remapping in line with upper_air and catch remapping,
allowing it to run in batch queues on both NAS and NCCS.
@mathomp4 mathomp4 self-assigned this Feb 23, 2026
@mathomp4 mathomp4 added the 0 diff The changes in this pull request have verified to be zero-diff with the target branch. label Feb 23, 2026
@biljanaorescanin
Copy link
Contributor

I tried to test with our testing script and seems some merge or this PR broke the testing script...: test_remap_restarts.py

@weiyuan-jiang @mathomp4

@mathomp4
Copy link
Member Author

I tried to test with our testing script and seems some merge or this PR broke the testing script...: test_remap_restarts.py

@weiyuan-jiang @mathomp4

Well nuts. I'll try and give it a go soon. It seemed simple...

@weiyuan-jiang
Copy link
Contributor

If you can show me the folder with the error message, I may find a way to fix it @biljanaorescanin

@weiyuan-jiang
Copy link
Contributor

I think it is safe to update the baseline or V12. Here is my new results /discover/nobackup/wjiang/REMAP_TESTS/ @mathomp4

@mathomp4
Copy link
Member Author

I think it is safe to update the baseline or V12. Here is my new results /discover/nobackup/wjiang/REMAP_TESTS/ @mathomp4

Done! Thanks :)

@mathomp4 mathomp4 marked this pull request as ready for review March 12, 2026 11:13
@mathomp4 mathomp4 requested a review from a team as a code owner March 12, 2026 11:13
@mathomp4 mathomp4 linked an issue Mar 12, 2026 that may be closed by this pull request
…ndice

# Conflicts:
#	pre/remap_restart/remap_lake_landice_saltwater.py
@mathomp4 mathomp4 mentioned this pull request Mar 20, 2026
@mathomp4 mathomp4 requested a review from a team as a code owner March 24, 2026 15:01
@biljanaorescanin
Copy link
Contributor

After last commit @mathomp4 made to limit GCMv12 just to run on milan nodes this PR ( well merge of 181 and 179 is zero diff for remap package testing).

@sdrabenh sdrabenh merged commit 273422c into feature/sdrabenh/gcm_v12 Mar 24, 2026
19 checks passed
@sdrabenh sdrabenh deleted the feature/v12-slurm-lakelandice branch March 24, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0 diff The changes in this pull request have verified to be zero-diff with the target branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with test_remap_restarts.py

4 participants