Is your feature request related to a problem? Please describe.
Memory used in MPI communications has to be pinned. On CPU based architectures, this pinning has a cost (not too expensive), but on accelerated architectures, vendors (eg NVIDIA) insist that we reserve memory and reuse it for communications.
This led us to allocate persistent buffers in ectrans and in the semi-lagrangian; at some point, other parts of the code (IO, semi-implicit) requiring communications would be affected.
Describe the solution you'd like
I would like to have a mechanism to manage a memory buffer dedicated to MPI communications. This buffer could be allocated in the setup after querying the different parts of the code requiring MPI communications. We could use a heap based algorithm to manage this buffer.
This would avoid creating persistent buffers in different parts of the code and reduce the memory footprint of the application, at least for GPU architectures (where memory is expensive).
This, of course, has to be prototyped and benchmarked (at least on CPU architectures).
Describe alternatives you've considered
No response
Additional context
No response
Organisation
Météo-France
Is your feature request related to a problem? Please describe.
Memory used in MPI communications has to be pinned. On CPU based architectures, this pinning has a cost (not too expensive), but on accelerated architectures, vendors (eg NVIDIA) insist that we reserve memory and reuse it for communications.
This led us to allocate persistent buffers in ectrans and in the semi-lagrangian; at some point, other parts of the code (IO, semi-implicit) requiring communications would be affected.
Describe the solution you'd like
I would like to have a mechanism to manage a memory buffer dedicated to MPI communications. This buffer could be allocated in the setup after querying the different parts of the code requiring MPI communications. We could use a heap based algorithm to manage this buffer.
This would avoid creating persistent buffers in different parts of the code and reduce the memory footprint of the application, at least for GPU architectures (where memory is expensive).
This, of course, has to be prototyped and benchmarked (at least on CPU architectures).
Describe alternatives you've considered
No response
Additional context
No response
Organisation
Météo-France