-
Notifications
You must be signed in to change notification settings - Fork 368
[Draft] OpenACC port of halo exchanges #1355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
abishekg7
wants to merge
30
commits into
MPAS-Dev:develop
Choose a base branch
from
abishekg7:framework/acc_halo_exch
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
f5a7287
to
acdba1c
Compare
This PR consolidates much of the OpenACC host and device data transfers during the course of the dynamical execution to two subroutines mpas_atm_pre_dynamics _h2d and mpas_atm_post_dynamics_d2h that are called before and after the call to atm_srk3 subroutine. Due to atm_compute_solve_diagnostics also being called once before the start of model run, we also have a pair of subroutines mpas_atm _pre_computesolvediag_h2d and mpas_atm_post_computesolvediag_d2h to handle data movements around the first call to atm_compute_solve_diagnostics. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core. The mesh/time-invariant fields are still copied onto the device in mpas_atm_ dynamics_init and removed from the device in mpas_atm_dynamics_finalize, with the exception of select fields moved in mpas_atm_pre_computesolvediag_h2d and mpas_atm_post_computesolvediag_d2h. This is a special case due to atm_compute_ solve_diagnostics being called for the first time before the call to mpas_atm_ dynamics_init This PR also includes explicit host-device data transfers in the mpas_atm_iau, mpas_atmphys_interface and mpas_atmphys_todynamics modules to ensure that the physics and IAU regions, which run on CPU, use the latest values from the dynamical core running on GPUs, and vice versa. In addition, this PR also includes explicit data transfers around halo exchanges in the atm_srk3 subroutine. These subroutines for data routines, and the acc update statements are an interim solution until we have a book-keeping method in place. This PR also introduces a couple of new timers to keep track of the cost of data transfers.
…t_2d This commit introduces two OpenACC data transfer routines, mpas_reconstruct_2d_h2d and mpas_reconstruct_2d_d2h in order to remove the data transfers from the mpas_reconstruct_2d routine itself. This also allows us to remove extraneous data movements within the atm_srk3 routine. mpas_reconstruct_2d_h2d and mpas_reconstruct_2d_d2h are called before and after the call to mpas_reconstruct in atm_mpas_init_block. And the reconstructed vector fields are also copied to and from the device before and after every dynamics call in mpas_atm_pre_dynamics_h2d and mpas_atm_post_dynamics_d2h.
This commit does work and matches the previous results!
NOTE: The last commit was successful!
Last commit had differences from the baseline. It's either this, or the change dropping 'update device(group % sendBuf(:)' in the last commit
Last commit still had answer differences
This should make the dependency analysis easier on the compiler. NOTE: The last commit succeeded and had no diffs after 1 timestep compared to a reference run!
…o force GPUDirect MPI NOTE: The last commit ran successfully and matched previous 1 step results
…r variables Last run failed with CUDA_ERROR_ILLEGAL_ADDRESS, I think keeping these on the GPU would help!
Last commit gave me some big differences, let's see if this helps. If this helps, then that means I wasn't using GPU-aware MPI routines like I thought...
…calls instead Last commit still had answer differences. NOTE: This commit does too
Introducing a new namelist option under development, config_gpu_aware_mpi, which will control whether the OpenACC run of MPAS on GPUs will use GPU-aware MPI or do a device<->host update of variables around the call to a purely CPU- based halo exchange. Note: This feature is not available to use when config_halo_exch_method is set to 'mpas_dmpar'
56ed028
to
a8fda92
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR enables execution of halo exchanges on GPUs via OpenACC directives. This uses #1315 as the base branch, so 1315 needs to be merged before the current PR can be merged.
The packing and unpacking code around the halo exchanges use
!$acc parallel
regions.The actual MPI_Isend and MPI_Irecv operations use CUDA-aware MPI, by wrapping these calls within