Remaining OpenMP Target Kernels by PaulMullowney · Pull Request #85 · ecmwf-ifs/ecrad

PaulMullowney · 2025-10-03T02:47:25Z

No description provided.

reuterbal

Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?

After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?

Just purely FYI, tests right now fail with errors like the following:

ecrad_dp_mcica_acc:

 *** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1     [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file

Which typically means there are NaN somewhere in the field.
2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:

Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
 File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
 Function: create_device:852
 Line: 879

Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:

*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range   0.000     to  0.1146E-36 is out of physical range   10.00    to   900.0

This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.

PaulMullowney · 2025-10-14T12:06:18Z

Hi @reuterbal I just rebased against master-omp. This should resolve the conflicts.

PaulMullowney · 2025-10-14T12:31:42Z

Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?

After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?

Just purely FYI, tests right now fail with errors like the following:

ecrad_dp_mcica_acc:
 *** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1     [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file
Which typically means there are NaN somewhere in the field. 2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:
Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
 File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
 Function: create_device:852
 Line: 879
Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:
*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range   0.000     to  0.1146E-36 is out of physical range   10.00    to   900.0    
This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.

Up to this PR (i.e. excluding #86), I can make a small, but unacceptable change via:

(nccmp_venv) [pmullown@TheraC16 ecrad-upstreaming]$ git diff
diff --git a/driver/ecrad_ifs_driver_blocked.F90 b/driver/ecrad_ifs_driver_blocked.F90
index b547eb7..cbe70f3 100644
--- a/driver/ecrad_ifs_driver_blocked.F90
+++ b/driver/ecrad_ifs_driver_blocked.F90
@@ -480,8 +480,7 @@ program ecrad_ifs_driver
 #ifdef BITIDENTITY_TESTING
         !$OMP TARGET UPDATE TO(iseed(:,ib))
 #endif
-        !$OMP TARGET UPDATE TO(zrgp(1:il,ifs_config%iinbeg:ifs_config%iinend,ib), &
-        !$OMP&                 zrgp(1:il,ifs_config%ioutend+1:ifs_config%ifldstot,ib))
+        !$OMP TARGET UPDATE TO(zrgp(:,:,ib))
 #endif
 #if defined(_OPENACC)
         !$acc data create(zrgp(:,:,ib)) &
@@ -544,7 +543,7 @@ program ecrad_ifs_driver
 #if defined(OMPGPU)
 #ifdef COPY_ASYNC
 #else
-        !$OMP TARGET UPDATE FROM(zrgp(1:il,ifs_config%ioutbeg:ifs_config%ioutend,ib))
+        !$OMP TARGET UPDATE FROM(zrgp(:,:,ib))
         !$OMP TARGET EXIT DATA MAP(DELETE:zrgp(:,:,ib))

I then see agreement at double precision:

python3 test/common/nccmp.py --longwave-threshold=0.001 --shortwave-threshold=0.001 build.nv.24.9.Release/run/ecrad_ifs_blocked.nc build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc

Where:

build.nv.24.9.Release/run/ecrad_ifs_blocked.nc is generated from the latest master-acc
build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc is generated from this PR plus the changes above
I run master-acc with nblocksize=8480 and master-omp with nblocksize=16960

If I don't make the change to ecrad_ifs_blocked.F90 or I change nblocksize to 8480 (for master-omp) I start to see signficant diffs.

PaulMullowney · 2025-10-14T14:09:54Z

Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?
After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?
Just purely FYI, tests right now fail with errors like the following:

ecrad_dp_mcica_acc:
 *** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1     [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file
Which typically means there are NaN somewhere in the field. 2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:
Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
 File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
 Function: create_device:852
 Line: 879
Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:
*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range   0.000     to  0.1146E-36 is out of physical range   10.00    to   900.0    
This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.
Up to this PR (i.e. excluding #86), I can make a small, but unacceptable change via:
(nccmp_venv) [pmullown@TheraC16 ecrad-upstreaming]$ git diff
diff --git a/driver/ecrad_ifs_driver_blocked.F90 b/driver/ecrad_ifs_driver_blocked.F90
index b547eb7..cbe70f3 100644
--- a/driver/ecrad_ifs_driver_blocked.F90
+++ b/driver/ecrad_ifs_driver_blocked.F90
@@ -480,8 +480,7 @@ program ecrad_ifs_driver
 #ifdef BITIDENTITY_TESTING
         !$OMP TARGET UPDATE TO(iseed(:,ib))
 #endif
-        !$OMP TARGET UPDATE TO(zrgp(1:il,ifs_config%iinbeg:ifs_config%iinend,ib), &
-        !$OMP&                 zrgp(1:il,ifs_config%ioutend+1:ifs_config%ifldstot,ib))
+        !$OMP TARGET UPDATE TO(zrgp(:,:,ib))
 #endif
 #if defined(_OPENACC)
         !$acc data create(zrgp(:,:,ib)) &
@@ -544,7 +543,7 @@ program ecrad_ifs_driver
 #if defined(OMPGPU)
 #ifdef COPY_ASYNC
 #else
-        !$OMP TARGET UPDATE FROM(zrgp(1:il,ifs_config%ioutbeg:ifs_config%ioutend,ib))
+        !$OMP TARGET UPDATE FROM(zrgp(:,:,ib))
         !$OMP TARGET EXIT DATA MAP(DELETE:zrgp(:,:,ib))
I then see agreement at double precision:

python3 test/common/nccmp.py --longwave-threshold=0.001 --shortwave-threshold=0.001 build.nv.24.9.Release/run/ecrad_ifs_blocked.nc build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc

Where:

build.nv.24.9.Release/run/ecrad_ifs_blocked.nc is generated from the latest master-acc

build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc is generated from this PR plus the changes above

I run master-acc with nblocksize=8480 and master-omp with nblocksize=16960

If I don't make the change to ecrad_ifs_blocked.F90 or I change nblocksize to 8480 (for master-omp) I start to see signficant diffs.

In fact, I see good agreement between master-acc and master-omp across ecrad_dp, ecrad_ifs_dp, and ecrad_ifs_blocked_dp (with the changes noted above).

PaulMullowney · 2025-10-14T14:12:22Z

I'm not sure how much we can read into these errors
*** Warning: gas%mixing_ratio range -0.3401E+39 to 0.3280E+39 is out of physical range 0.000 to 1.000

when we have the following issues in the run:

Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
File: /dev/shm/tmpdir.***.39266861/build/radiation.sp/CMakeFiles/ecrad.sp.dir/radiation_gas.F90-pp.f90
Function: create_device:852

reuterbal

Many thanks! As discussed offline, this should be a fully runnable version now with aomp-afar. We will try to set-up a Github Actions runner that tests with this compiler and investigate the NVidia-failures separately.

github-actions bot added the contributor label Oct 3, 2025

reuterbal added the approved-for-ci label Oct 14, 2025

reuterbal requested changes Oct 14, 2025

View reviewed changes

Remaining openmp offload kernels

6382959

PaulMullowney force-pushed the more_openmp branch from 9818b83 to 6382959 Compare October 14, 2025 12:05

github-actions bot removed the approved-for-ci label Oct 14, 2025

reuterbal added the approved-for-ci label Oct 14, 2025

reuterbal approved these changes Oct 15, 2025

View reviewed changes

reuterbal merged commit 9e0a480 into ecmwf-ifs:master-omp Oct 15, 2025
15 of 16 checks passed

PaulMullowney deleted the more_openmp branch October 15, 2025 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remaining OpenMP Target Kernels#85

Remaining OpenMP Target Kernels#85
reuterbal merged 1 commit intoecmwf-ifs:master-ompfrom
PaulMullowney:more_openmp

PaulMullowney commented Oct 3, 2025

Uh oh!

reuterbal left a comment

Uh oh!

PaulMullowney commented Oct 14, 2025

Uh oh!

PaulMullowney commented Oct 14, 2025 •

edited

Loading

Uh oh!

PaulMullowney commented Oct 14, 2025

Uh oh!

PaulMullowney commented Oct 14, 2025

Uh oh!

reuterbal left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PaulMullowney commented Oct 3, 2025

Uh oh!

reuterbal left a comment

Choose a reason for hiding this comment

Uh oh!

PaulMullowney commented Oct 14, 2025

Uh oh!

PaulMullowney commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PaulMullowney commented Oct 14, 2025

Uh oh!

PaulMullowney commented Oct 14, 2025

Uh oh!

reuterbal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PaulMullowney commented Oct 14, 2025 •

edited

Loading