Conversation
…_ozone_forcing_data 2. Add NEPTUNE interstitials in physics/Interstitials/UFS_SCM_NEPTUNE/
… of reading with every MPI rank
…th mpiroot; move mpiutil.F90 to subdirectory tools
climbfuji
commented
Jan 8, 2026
…el if MPI broadcast errors occur
…*.F90 when writing to errmsg for invalid w3kindreal/w3kindint; additionally: formatting updates
…th mpiroot; move mpiutil.F90 to subdirectory tools
Collaborator
Author
|
Reviewers, right now this large pull request can still be updated from develop without conflicts, but given how many files it touches this may not be the case for much longer ... can we pick this PR up, please? |
mdtoyNOAA
approved these changes
Mar 6, 2026
dustinswales
approved these changes
Mar 6, 2026
Member
dustinswales
left a comment
There was a problem hiding this comment.
Thanks @climbfuji. This was a lot of work!
One question. Do we need calls to mpi_barrier() between the read and the broadcast?
if (mpi_rank==mpi_root)
endif
**mpi_barrier()**
call ccpp_bcast()
Collaborator
Author
No, I don't believe that's needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Changes:
This PR modifies how data is read in the CCPP
initandtimestep_initphases. Instead of reading the data serially with every single MPI task, the data is read by the MPI root rank and then broadcasted. This is implemented for all code except the GOCART aerosols (NEPTUNE doesn't use these, hence we have no way to test; also to check: new o3 and h2o code).The implementation is taking the path described in #1106: an MPI broadcast wrapper is added in a new module
mpiutilswhich wraps around the - now type dependent - MPI interfaces inmpi_f08.The CCPP MPI broadcast routines in this PR make use of a
ccpp_abortfunction to stop the model in the event of an MPI error. This is not following CCPP requirements to avoid having to passerrmsganderrflgall the way down and then back out to the host model to abort. CCPP compliancy with current rules can be implemented, but it is worth discussing if alternative methods are preferable and/or simplify the code. To note: The authoritative code in NCAR ccpp-physics in many places simplies callsstopto abort the model. That's much worse than usingMPI_ABORTand of course also not CCPP compliant. In NEPTUNE, we've used a function equivalent toccpp_abortin these places.Sample output from a crash in
ccpp_abortwith GNU:Tests Conducted:
This code is coming from the NRL fork of ccpp-physics. It has been tested extensively and is used in the latest code delivered for operational implementation. Reading with the MPI root rank and broadcasting solved the performance issues we have seen on large task counts that are described briefly in #1106.
We'll need to test these changes in the SCM and the UFS; in particular for the latter, we want to look at b4b reproducibility (results were zero-diff when we introduced this in NEPTUNE) and at performance implications for production-size runs.
Dependencies:
None
Documentation:
I suggest we discuss the nitty gritty details of the implementation (how to stop the model if an error occurs in the broadcast routines) before we update the documentation in ccpp-doc.
Issue (optional):
Closes #1106
Contributors (optional):
@matusmartini (NRL)