UBCL: Add PML/UBCL and OSC/UBCL #13475

vanman-nguyen · 2025-10-28T13:39:22Z

This PR adds the support for Eviden's communication library, UBCL (Unified Bxi Communication Layer), by the addition of a PML/UBCL and of an OSC/UBCL, thus enabling MPI communications through the BXI network.
Both components are currently maintained by us.

We are also currently working on joining the MTT efforts to continuously validate these components on our infrastructure.

Note: The PML does not yet support MPI_Session and the OSC does not yet support accelerators buffers. We plan on implementing these features in the future.

Co-authored-by: Florent GERMAIN <[email protected]> Co-authored-by: Pierre LEMARINIER <[email protected]> Co-authored-by: Antoine CAPRA <[email protected]> Co-authored-by: Emmanuel BRELLE <[email protected]> Co-authored-by: Van Man NGUYEN <[email protected]> Co-authored-by: Julien DUPRAT <[email protected]> Co-authored-by: Tristan CALS <[email protected]> Co-authored-by: Anton DAUMEN <[email protected]> Co-authored-by: Alice CARIBONI <[email protected]> Co-authored-by: François WELLENREITER <[email protected]> Signed-off-by: Van Man NGUYEN <[email protected]>

edgargabriel

I think the code looks fundamentally fine with me, I was not able to test it, just read through some parts of it.

The one question that I have is that the pml component only seems to support CUDA buffers at the moment, is this correct (i.e. not the other accelerator components such as rocm or ze)? Is there something fundamental missing, or just a case of not having tested it with other GPUs?

FlorentGermain-Bull · 2025-11-06T09:30:51Z

I think the code looks fundamentally fine with me, I was not able to test it, just read through some parts of it.

The one question that I have is that the pml component only seems to support CUDA buffers at the moment, is this correct (i.e. not the other accelerator components such as rocm or ze)? Is there something fundamental missing, or just a case of not having tested it with other GPUs?

To properly handle CUDA buffers in UBCL, we have to build things around gdrcpy to avoid unnecessary copies. This is why we ask ompi to look for CUDA buffers.
With rocm, the buffers are already reachable by the CPU so it is functionnal, even if performance suffers a lot from it (we plan to work around it in the future).
We did not try it with ze yet.

We have an internal fallback in UBCL which relies on explicit packing handled by the convertors (that's the custom memory descriptors). If a contiguous memory descriptor is not build, UBCL will use explicit packing.
If convertors relating to ze buffers are tagged as needing packing (with opal_convertor_need_buffers), UBCL should rely on the convertors to pack it, otherwise we will need to add a detection for ze too.

github-actions bot added the Target: main label Oct 28, 2025

hppritcha self-requested a review October 28, 2025 16:05

edgargabriel self-requested a review November 5, 2025 19:56

edgargabriel approved these changes Nov 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UBCL: Add PML/UBCL and OSC/UBCL #13475

UBCL: Add PML/UBCL and OSC/UBCL #13475

vanman-nguyen commented Oct 28, 2025

Uh oh!

edgargabriel left a comment

Uh oh!

FlorentGermain-Bull commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UBCL: Add PML/UBCL and OSC/UBCL #13475

Are you sure you want to change the base?

UBCL: Add PML/UBCL and OSC/UBCL #13475

Conversation

vanman-nguyen commented Oct 28, 2025

Uh oh!

edgargabriel left a comment

Choose a reason for hiding this comment

Uh oh!

FlorentGermain-Bull commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants