Switch allocatable to pointer in ecRAD derived types#81
Switch allocatable to pointer in ecRAD derived types#81wertysas wants to merge 8 commits intoecmwf-ifs:masterfrom
allocatable to pointer in ecRAD derived types#81Conversation
…nter versions of put and copy routines
| & cos_sza, & ! (ncol) Cosine of solar zenith angle | ||
| & skin_temperature ! (ncol) Skin temperature (K) | ||
| real(jprb), pointer, dimension(:) :: & | ||
| & cos_sza=>null(), & ! (ncol) Cosine of solar zenith angle |
There was a problem hiding this comment.
I seem to have stumbled across an issue with the GCC compiler. When creating an object of type single_level_type in ICON, these pointers are associated which leads to an error at the first call of single_level%allocate below, since there is nothing to deallocate:
if (associated(this%cos_sza)) then
deallocate(this%cos_sza)
this%cos_sza => null()
end ifAs far as I can see, the code is correct and these pointers should not be associated, but I cannot figure out the reason for this strange behavior of the GCC compiler. I tried to create a small reproducer, but the reproducer works as intended. After nullifying the pointers in the object of single_level_type explicitely, I got the same errors in gas_type. With a long list of nullifies for all objects of ecrad input and output types, everything seems to work. At least it is running, I did not check the results in detail.
I am bit puzzled, did anyone experience something similar?
There was a problem hiding this comment.
Short update: It seems that we found the cause of this issue. The associated pointers show only up inside an OMP parallel region. The objects were private, but that seems not to initialize them correctly. Setting the objects to firstprivate seems to fix it.
There was a problem hiding this comment.
That's great, thanks for the update. (First)privatization of derived types can be a problem and not all compilers are always able to do that correctly.
There was a problem hiding this comment.
That's why I avoided them in our current implementation. But if we do not want to nullify every variable in the derived types inside the parallel region, this seems to be the best solution. The OpenMP standard actually specifies this (Thanks to my colleague Roland Wirth for pointing this all out):
OpenMP 4.5, 2.15.3.3:
"If any statement of the construct references a list item, a new list
item of the same type and type parameters is allocated. This allocation
occurs once for each task generated by the construct and once for each
SIMD lane used by the construct. The initial value of the new list item
is undefined. The initial status of a private pointer is undefined."
OpenMP Version 5.1 2020 (2.21.3):
"If any statement of the construct references a list item, a new list
item of the same type and type parameters is allocated. This allocation
occurs once for each task generated by the construct and once for each
SIMD lane used by the construct. If the type of the list item has
default initialization, the new list item has default initialization.
Otherwise, the initial value of the new list item is undefined. The
initial status of a private pointer is undefined."
| ! Mixing ratios of variable gases, dimensioned (ncol, nlev, | ||
| ! NMaxGases) | ||
| real(jprb), allocatable, dimension(:,:,:) :: mixing_ratio | ||
| real(jprb), pointer, dimension(:,:,:) :: mixing_ratio=>null() |
There was a problem hiding this comment.
One big advantage of the pointer implementation is that we could get rid of a lot of array copies. Since we do not have the mixing ratios in a contiguous array, this would still require additional allocations or copies for this example. Same issue for the aerosol type. An array of pointers (or a linked list?) could allow us to use pointers in this example as well. Would this be an option for you?
Maybe there are some more issues for this approach to use pointers instead of allocations. I have no implementation up to now, but this would be really interesting for us.
There was a problem hiding this comment.
The same is true in IFS - but they are not always in the right units to allow associating them directly. Thus, we have to copy them anyway to store the correctly scaled representation. Just double-checking that this isn't the case in ICON?
Apart from that, a list of pointers may make use on GPU a little more tricky (requires attaching the device pointers in addition to the host pointers) but that should be solvable.
There was a problem hiding this comment.
The same is true in IFS - but they are not always in the right units to allow associating them directly. Thus, we have to copy them anyway to store the correctly scaled representation. Just double-checking that this isn't the case in ICON?
This is completely dependent on namelist choices and differs for the different species. At least for some, e.g. water vapor, we could set pointers right away.
|
Thanks for this development! With a naive implementation (i.e. still allocating the pointers plus copy instead of pointing to ICON memory) we see a slightly worse performance on NEC. This will surely look better when we get rid of the copies. From our side, please continue, we are looking forward to this change. |
Thank you for the comments and for testing this out so quickly! We will go ahead and test this with the full IFS. |
|
As I mentioned in the last ecRad technical development meeting, this has been tested with the IFS and the pointer version has the same performance as the old allocatable version for us in the IFS. Thus, I am marking this as ready for review now. Please feel free to review and test these changes with your codes if you would like to and haven't had the chance to do so already! |
This PR switches all allocatable members of the ecRAD derived types (
single_level_type,thermodynamics_type,gas_type,ylcloud_type,aerosol_type,flux_type) to pointers. The three main changes to the code are:allocatableattribute topointerof the non-scalar members in the derived types and changing correspondingallocatedcalls toassociatedthroughout the source code.easy_netcdfutility module that can be used with pointers instead of allocatablesFurthermore, this PR also contains some minor code clean and bug fixes:
change_namelist.shscript that is BSD/GNU agnostic and now works on Mac OSset_units_gasthat makes it non-recursive.In general the performance of the pointer version is better than before, see below for more detailed performance results. Several of the test configurations have also been tested with Valgrind to ensure that the switch from allocatables to pointers does not introduce any memory leaks.
--------------------------------------------
Performance results
The performance of this change has been evaluated on a CPU node on ECMWF's HPC using data from the maelstrom weather data set. The following configurations have been included in the performance tests:
Generally, the pointer versions have better performance than the allocatable versions as can be seen in the table below that summarise the data of all performance runs (448 configs) with the intel fort compiler version 2021.4.0 20210910
Relative Difference in Runtime After Switching from Allocatables to Pointers (ifort 2021.4.0 20210910)
Note: Negative values indicate performance improvement (faster runtime) when using pointers instead of allocatables.
Figures for selected nproma and thread counts
The figures below show the average runtime of different configurations in single and double precision with nproma = 8,32 threads=16,32 and the INTEL and GNU compiler. Compiler versions: