Skip to content

Remaining OpenMP Target Kernels#85

Merged
reuterbal merged 1 commit intoecmwf-ifs:master-ompfrom
PaulMullowney:more_openmp
Oct 15, 2025
Merged

Remaining OpenMP Target Kernels#85
reuterbal merged 1 commit intoecmwf-ifs:master-ompfrom
PaulMullowney:more_openmp

Conversation

@PaulMullowney
Copy link

No description provided.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?

After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?

Just purely FYI, tests right now fail with errors like the following:

  1. ecrad_dp_mcica_acc:
 *** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1     [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file

Which typically means there are NaN somewhere in the field.
2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:

Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
 File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
 Function: create_device:852
 Line: 879

Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:

*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range   0.000     to  0.1146E-36 is out of physical range   10.00    to   900.0    

This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.

@PaulMullowney
Copy link
Author

Hi @reuterbal I just rebased against master-omp. This should resolve the conflicts.

@PaulMullowney
Copy link
Author

PaulMullowney commented Oct 14, 2025

Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?

After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?

Just purely FYI, tests right now fail with errors like the following:

  1. ecrad_dp_mcica_acc:
 *** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1     [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file

Which typically means there are NaN somewhere in the field. 2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:

Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
 File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
 Function: create_device:852
 Line: 879

Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:

*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range   0.000     to  0.1146E-36 is out of physical range   10.00    to   900.0    

This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.

Up to this PR (i.e. excluding #86), I can make a small, but unacceptable change via:

(nccmp_venv) [pmullown@TheraC16 ecrad-upstreaming]$ git diff
diff --git a/driver/ecrad_ifs_driver_blocked.F90 b/driver/ecrad_ifs_driver_blocked.F90
index b547eb7..cbe70f3 100644
--- a/driver/ecrad_ifs_driver_blocked.F90
+++ b/driver/ecrad_ifs_driver_blocked.F90
@@ -480,8 +480,7 @@ program ecrad_ifs_driver
 #ifdef BITIDENTITY_TESTING
         !$OMP TARGET UPDATE TO(iseed(:,ib))
 #endif
-        !$OMP TARGET UPDATE TO(zrgp(1:il,ifs_config%iinbeg:ifs_config%iinend,ib), &
-        !$OMP&                 zrgp(1:il,ifs_config%ioutend+1:ifs_config%ifldstot,ib))
+        !$OMP TARGET UPDATE TO(zrgp(:,:,ib))
 #endif
 #if defined(_OPENACC)
         !$acc data create(zrgp(:,:,ib)) &
@@ -544,7 +543,7 @@ program ecrad_ifs_driver
 #if defined(OMPGPU)
 #ifdef COPY_ASYNC
 #else
-        !$OMP TARGET UPDATE FROM(zrgp(1:il,ifs_config%ioutbeg:ifs_config%ioutend,ib))
+        !$OMP TARGET UPDATE FROM(zrgp(:,:,ib))
         !$OMP TARGET EXIT DATA MAP(DELETE:zrgp(:,:,ib))

I then see agreement at double precision:

python3 test/common/nccmp.py --longwave-threshold=0.001 --shortwave-threshold=0.001 build.nv.24.9.Release/run/ecrad_ifs_blocked.nc build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc

Where:

  1. build.nv.24.9.Release/run/ecrad_ifs_blocked.nc is generated from the latest master-acc
  2. build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc is generated from this PR plus the changes above
  3. I run master-acc with nblocksize=8480 and master-omp with nblocksize=16960

If I don't make the change to ecrad_ifs_blocked.F90 or I change nblocksize to 8480 (for master-omp) I start to see signficant diffs.

@PaulMullowney
Copy link
Author

Thanks for all your work! With this, the OpenMP offload version should be runnable on device? Which driver should this work for, or does it require the driver from PR #86?
After merging previous PRs, this PR has now incurred a conflict in radiation_ifs_rrtm.F90, which needs to be resolved. The OpenMP offload tests are also still failing, maybe it's worthwhile rebasing over the latest version of master-omp to see if that suffices to make them pass? Or is it too early to attempt to fix them now?
Just purely FYI, tests right now fail with errors like the following:

  1. ecrad_dp_mcica_acc:
 *** Error writing matrix flux_up_lw: NetCDF: Numeric conversion not representable
ABOR1     [PROC=1,THRD=1] : Error writing NetCDF file
MPL_ABORT [PROC=1,THRD=1] : Error writing NetCDF file

Which typically means there are NaN somewhere in the field. 2. ecrad_ifs_dp_mcica_acc_net and ecrad_ifs_blocked_dp_mcica_acc_net:

Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
 File: /dev/shm/_tmpdir_.***.39165529/build/radiation.dp/CMakeFiles/ecrad.dp.dir/radiation_gas.F90-pp.f90
 Function: create_device:852
 Line: 879

Interestingly, the double-precision ecrad_dp_mcica_acc_net claims to pass, and the single-precision ecrad_sp_mcica_acc, but not the ecrad_sp_mcica_acc_net, which discovers NaN and unphysical values:

*** Warning: sw_dn_surf_band contains NaN
*** Warning: lw_up range   0.000     to  0.1146E-36 is out of physical range   10.00    to   900.0    

This could all be due to missing data movements, of course, if you've not yet tested these particular driver implementations.

Up to this PR (i.e. excluding #86), I can make a small, but unacceptable change via:

(nccmp_venv) [pmullown@TheraC16 ecrad-upstreaming]$ git diff
diff --git a/driver/ecrad_ifs_driver_blocked.F90 b/driver/ecrad_ifs_driver_blocked.F90
index b547eb7..cbe70f3 100644
--- a/driver/ecrad_ifs_driver_blocked.F90
+++ b/driver/ecrad_ifs_driver_blocked.F90
@@ -480,8 +480,7 @@ program ecrad_ifs_driver
 #ifdef BITIDENTITY_TESTING
         !$OMP TARGET UPDATE TO(iseed(:,ib))
 #endif
-        !$OMP TARGET UPDATE TO(zrgp(1:il,ifs_config%iinbeg:ifs_config%iinend,ib), &
-        !$OMP&                 zrgp(1:il,ifs_config%ioutend+1:ifs_config%ifldstot,ib))
+        !$OMP TARGET UPDATE TO(zrgp(:,:,ib))
 #endif
 #if defined(_OPENACC)
         !$acc data create(zrgp(:,:,ib)) &
@@ -544,7 +543,7 @@ program ecrad_ifs_driver
 #if defined(OMPGPU)
 #ifdef COPY_ASYNC
 #else
-        !$OMP TARGET UPDATE FROM(zrgp(1:il,ifs_config%ioutbeg:ifs_config%ioutend,ib))
+        !$OMP TARGET UPDATE FROM(zrgp(:,:,ib))
         !$OMP TARGET EXIT DATA MAP(DELETE:zrgp(:,:,ib))

I then see agreement at double precision:

python3 test/common/nccmp.py --longwave-threshold=0.001 --shortwave-threshold=0.001 build.nv.24.9.Release/run/ecrad_ifs_blocked.nc build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc

Where:

  1. build.nv.24.9.Release/run/ecrad_ifs_blocked.nc is generated from the latest master-acc
  2. build.rocm-afar-22.1.0.Release.fast-real-mod/run/ecrad_ifs_blocked.nc is generated from this PR plus the changes above
  3. I run master-acc with nblocksize=8480 and master-omp with nblocksize=16960

If I don't make the change to ecrad_ifs_blocked.F90 or I change nblocksize to 8480 (for master-omp) I start to see signficant diffs.

In fact, I see good agreement between master-acc and master-omp across ecrad_dp, ecrad_ifs_dp, and ecrad_ifs_blocked_dp (with the changes noted above).

@PaulMullowney
Copy link
Author

I'm not sure how much we can read into these errors
*** Warning: gas%mixing_ratio range -0.3401E+39 to 0.3280E+39 is out of physical range 0.000 to 1.000

when we have the following issues in the run:

Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
File: /dev/shm/tmpdir.***.39266861/build/radiation.sp/CMakeFiles/ecrad.sp.dir/radiation_gas.F90-pp.f90
Function: create_device:852

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks! As discussed offline, this should be a fully runnable version now with aomp-afar. We will try to set-up a Github Actions runner that tests with this compiler and investigate the NVidia-failures separately.

@reuterbal reuterbal merged commit 9e0a480 into ecmwf-ifs:master-omp Oct 15, 2025
15 of 16 checks passed
@PaulMullowney PaulMullowney deleted the more_openmp branch October 15, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants