Remaining OpenMP Target Kernels#85
Conversation
reuterbal
left a comment
There was a problem hiding this comment.
Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?
After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?
Just purely FYI, tests right now fail with errors like the following:
- ecrad_dp_mcica_acc:
*** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1 [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file
Which typically means there are NaN somewhere in the field.
2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:
Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
Function: create_device:852
Line: 879
Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:
*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range 0.000 to 0.1146E-36 is out of physical range 10.00 to 900.0
This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.
9818b83 to
6382959
Compare
|
Hi @reuterbal I just rebased against master-omp. This should resolve the conflicts. |
Up to this PR (i.e. excluding #86), I can make a small, but unacceptable change via: I then see agreement at double precision: python3 test/common/nccmp.py --longwave-threshold=0.001 --shortwave-threshold=0.001 build.nv.24.9.Release/run/ecrad_ifs_blocked.nc build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc Where:
If I don't make the change to ecrad_ifs_blocked.F90 or I change nblocksize to 8480 (for master-omp) I start to see signficant diffs. |
In fact, I see good agreement between master-acc and master-omp across ecrad_dp, ecrad_ifs_dp, and ecrad_ifs_blocked_dp (with the changes noted above). |
|
I'm not sure how much we can read into these errors when we have the following issues in the run: Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution |
reuterbal
left a comment
There was a problem hiding this comment.
Many thanks! As discussed offline, this should be a fully runnable version now with aomp-afar. We will try to set-up a Github Actions runner that tests with this compiler and investigate the NVidia-failures separately.
No description provided.