-
Notifications
You must be signed in to change notification settings - Fork 1
Improve the efficiency of combining MPI and acceleration devices #55
Copy link
Copy link
Open
Labels
Description
Currently the combination of MPI and device accelerator code is too slow due to the copies between device and host done during the halo exchanges. This is a meta-issue to track the multiple steps to improve the performance that can be done in dl_esm_inf:
- field%set_data() should update accelerator devices #29 set_data() should update the device copies. This in fact will make the copies take twice as long but it is required for correctness when receiving the halo.
- Reduce the number copies (effective halo_exchanges). This can be done by enabling the dirty flag Extend halo-swap API to keep track of clean/dirty status #31
- Make the copies faster. The copy mechanism is chosen by the "read_from_device" function external to dl_esm_inf, but the library should expose when the copy is from just a sub-region (possibly uncontiguous) of the field, so that the external function can implement techniques like packing/buffering. Provide finer control of device data transfering in the halo_exchanges #53
- Hide the copies cost. This can be done with asynchronous steps, but note that there are multiple steps async_read_from_device->wait,async_mpi->wait,async_write_to_device->wait .
Reactions are currently unavailable