Skip to content

Conversation

@vanman-nguyen
Copy link

This PR adds the support for Eviden's communication library, UBCL (Unified Bxi Communication Layer), by the addition of a PML/UBCL and of an OSC/UBCL, thus enabling MPI communications through the BXI network.
Both components are currently maintained by us.

We are also currently working on joining the MTT efforts to continuously validate these components on our infrastructure.

Note: The PML does not yet support MPI_Session and the OSC does not yet support accelerators buffers. We plan on implementing these features in the future.

Co-authored-by: Florent GERMAIN <[email protected]>
Co-authored-by: Pierre LEMARINIER <[email protected]>
Co-authored-by: Antoine CAPRA <[email protected]>
Co-authored-by: Emmanuel BRELLE <[email protected]>
Co-authored-by: Van Man NGUYEN <[email protected]>
Co-authored-by: Julien DUPRAT <[email protected]>
Co-authored-by: Tristan CALS <[email protected]>
Co-authored-by: Anton DAUMEN <[email protected]>
Co-authored-by: Alice CARIBONI <[email protected]>
Co-authored-by: François WELLENREITER <[email protected]>

Signed-off-by: Van Man NGUYEN <[email protected]>
@hppritcha hppritcha self-requested a review October 28, 2025 16:05
@edgargabriel edgargabriel self-requested a review November 5, 2025 19:56
Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks fundamentally fine with me, I was not able to test it, just read through some parts of it.

The one question that I have is that the pml component only seems to support CUDA buffers at the moment, is this correct (i.e. not the other accelerator components such as rocm or ze)? Is there something fundamental missing, or just a case of not having tested it with other GPUs?

@FlorentGermain-Bull
Copy link
Contributor

I think the code looks fundamentally fine with me, I was not able to test it, just read through some parts of it.

The one question that I have is that the pml component only seems to support CUDA buffers at the moment, is this correct (i.e. not the other accelerator components such as rocm or ze)? Is there something fundamental missing, or just a case of not having tested it with other GPUs?

To properly handle CUDA buffers in UBCL, we have to build things around gdrcpy to avoid unnecessary copies. This is why we ask ompi to look for CUDA buffers.
With rocm, the buffers are already reachable by the CPU so it is functionnal, even if performance suffers a lot from it (we plan to work around it in the future).
We did not try it with ze yet.

We have an internal fallback in UBCL which relies on explicit packing handled by the convertors (that's the custom memory descriptors). If a contiguous memory descriptor is not build, UBCL will use explicit packing.
If convertors relating to ze buffers are tagged as needing packing (with opal_convertor_need_buffers), UBCL should rely on the convertors to pack it, otherwise we will need to add a detection for ze too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants