Skip to content

Switch allocatable to pointer in ecRAD derived types#81

Open
wertysas wants to merge 8 commits intoecmwf-ifs:masterfrom
wertysas:je-develop
Open

Switch allocatable to pointer in ecRAD derived types#81
wertysas wants to merge 8 commits intoecmwf-ifs:masterfrom
wertysas:je-develop

Conversation

@wertysas
Copy link
Contributor

@wertysas wertysas commented Sep 29, 2025

This PR switches all allocatable members of the ecRAD derived types (single_level_type, thermodynamics_type, gas_type, ylcloud_type, aerosol_type, flux_type) to pointers. The three main changes to the code are:

  1. A switch of the allocatable attribute to pointer of the non-scalar members in the derived types and changing corresponding allocated calls to associated throughout the source code.
  2. New methods in the easy_netcdf utility module that can be used with pointers instead of allocatables
  3. Deallocation of the derived types at the end of the standalone drivers to prevent memory leaks

Furthermore, this PR also contains some minor code clean and bug fixes:

  • Updated change_namelist.sh script that is BSD/GNU agnostic and now works on Mac OS
  • A rewrite of the method set_units_gas that makes it non-recursive.
  • New throughput metric prints (Cols/s) to the ecrad standalone drivers.

In general the performance of the pointer version is better than before, see below for more detailed performance results. Several of the test configurations have also been tested with Valgrind to ensure that the switch from allocatables to pointers does not introduce any memory leaks.

--------------------------------------------

Performance results

The performance of this change has been evaluated on a CPU node on ECMWF's HPC using data from the maelstrom weather data set. The following configurations have been included in the performance tests:

  • input configurations: configCY49R1/configCY49R1_ecckd
  • single/double precision
  • thread count: 1, 2, 4, 8, 16, 32, 64, 128
  • nproma: 1, 2, 4, 8, 16, 32, 64

Generally, the pointer versions have better performance than the allocatable versions as can be seen in the table below that summarise the data of all performance runs (448 configs) with the intel fort compiler version 2021.4.0 20210910

Relative Difference in Runtime After Switching from Allocatables to Pointers (ifort 2021.4.0 20210910)

Statistic Value
Mean -13.82
Standard Deviation 7.14
Minimum -50.94
1st Quartile (25%) -16.93
Median (50%) -14.29
3rd Quartile (75%) -9.82
Maximum 9.37

Note: Negative values indicate performance improvement (faster runtime) when using pointers instead of allocatables.

Figures for selected nproma and thread counts

The figures below show the average runtime of different configurations in single and double precision with nproma = 8,32 threads=16,32 and the INTEL and GNU compiler. Compiler versions:

  • INTEL ifort version 2021.4.0 20210910
  • GNU Fortran (ECMWF) 13.2.0

ecrad_sp_configCY49R1
ecrad_sp_configCY49R1_ecckd
ecrad_ifs_blocked_sp_configCY49R1
ecrad_ifs_blocked_sp_configCY49R1_ecckd
ecrad_ifs_blocked_dp_configCY49R1
ecrad_ifs_blocked_dp_configCY49R1_ecckd
ecrad_dp_configCY49R1
ecrad_dp_configCY49R1_ecckd

@wertysas wertysas removed the request for review from rjhogan September 29, 2025 14:12
@reuterbal
Copy link
Collaborator

@drieg @m-burba @huppd: For some reason I cannot add you as reviewers directly, but this is the pull request discussed at today's meeting

@wertysas wertysas requested a review from rjhogan October 2, 2025 12:51
& cos_sza, & ! (ncol) Cosine of solar zenith angle
& skin_temperature ! (ncol) Skin temperature (K)
real(jprb), pointer, dimension(:) :: &
& cos_sza=>null(), & ! (ncol) Cosine of solar zenith angle
Copy link
Contributor

@drieg drieg Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to have stumbled across an issue with the GCC compiler. When creating an object of type single_level_type in ICON, these pointers are associated which leads to an error at the first call of single_level%allocate below, since there is nothing to deallocate:

    if (associated(this%cos_sza)) then
      deallocate(this%cos_sza)
      this%cos_sza => null()
    end if

As far as I can see, the code is correct and these pointers should not be associated, but I cannot figure out the reason for this strange behavior of the GCC compiler. I tried to create a small reproducer, but the reproducer works as intended. After nullifying the pointers in the object of single_level_type explicitely, I got the same errors in gas_type. With a long list of nullifies for all objects of ecrad input and output types, everything seems to work. At least it is running, I did not check the results in detail.

I am bit puzzled, did anyone experience something similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short update: It seems that we found the cause of this issue. The associated pointers show only up inside an OMP parallel region. The objects were private, but that seems not to initialize them correctly. Setting the objects to firstprivate seems to fix it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, thanks for the update. (First)privatization of derived types can be a problem and not all compilers are always able to do that correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I avoided them in our current implementation. But if we do not want to nullify every variable in the derived types inside the parallel region, this seems to be the best solution. The OpenMP standard actually specifies this (Thanks to my colleague Roland Wirth for pointing this all out):

OpenMP 4.5, 2.15.3.3:
"If any statement of the construct references a list item, a new list
item of the same type and type parameters is allocated. This allocation
occurs once for each task generated by the construct and once for each
SIMD lane used by the construct. The initial value of the new list item
is undefined. The initial status of a private pointer is undefined."

OpenMP Version 5.1 2020 (2.21.3):
"If any statement of the construct references a list item, a new list
item of the same type and type parameters is allocated. This allocation
occurs once for each task generated by the construct and once for each
SIMD lane used by the construct. If the type of the list item has
default initialization, the new list item has default initialization.
Otherwise, the initial value of the new list item is undefined. The
initial status of a private pointer is undefined."

! Mixing ratios of variable gases, dimensioned (ncol, nlev,
! NMaxGases)
real(jprb), allocatable, dimension(:,:,:) :: mixing_ratio
real(jprb), pointer, dimension(:,:,:) :: mixing_ratio=>null()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big advantage of the pointer implementation is that we could get rid of a lot of array copies. Since we do not have the mixing ratios in a contiguous array, this would still require additional allocations or copies for this example. Same issue for the aerosol type. An array of pointers (or a linked list?) could allow us to use pointers in this example as well. Would this be an option for you?

Maybe there are some more issues for this approach to use pointers instead of allocations. I have no implementation up to now, but this would be really interesting for us.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is true in IFS - but they are not always in the right units to allow associating them directly. Thus, we have to copy them anyway to store the correctly scaled representation. Just double-checking that this isn't the case in ICON?

Apart from that, a list of pointers may make use on GPU a little more tricky (requires attaching the device pointers in addition to the host pointers) but that should be solvable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is true in IFS - but they are not always in the right units to allow associating them directly. Thus, we have to copy them anyway to store the correctly scaled representation. Just double-checking that this isn't the case in ICON?

This is completely dependent on namelist choices and differs for the different species. At least for some, e.g. water vapor, we could set pointers right away.

@drieg
Copy link
Contributor

drieg commented Oct 20, 2025

Thanks for this development! With a naive implementation (i.e. still allocating the pointers plus copy instead of pointing to ICON memory) we see a slightly worse performance on NEC. This will surely look better when we get rid of the copies. From our side, please continue, we are looking forward to this change.

@wertysas
Copy link
Contributor Author

Thanks for this development! With a naive implementation (i.e. still allocating the pointers plus copy instead of pointing to ICON memory) we see a slightly worse performance on NEC. This will surely look better when we get rid of the copies. From our side, please continue, we are looking forward to this change.

Thank you for the comments and for testing this out so quickly! We will go ahead and test this with the full IFS.

@wertysas
Copy link
Contributor Author

As I mentioned in the last ecRad technical development meeting, this has been tested with the IFS and the pointer version has the same performance as the old allocatable version for us in the IFS. Thus, I am marking this as ready for review now. Please feel free to review and test these changes with your codes if you would like to and haven't had the chance to do so already!

@wertysas wertysas marked this pull request as ready for review November 11, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants