-
Notifications
You must be signed in to change notification settings - Fork 570
[Common] Comm+GEMM overlap API updated to support cuBlasMp backend (incl. framework API) #2443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…rk extensions Signed-off-by: Alp Dener <[email protected]>
9c8fe95 to
334e8c4
Compare
…entirely Signed-off-by: Alp Dener <[email protected]>
908bbc2 to
69cf235
Compare
for more information, see https://pre-commit.ci
|
|
||
| CommOverlapCore::CommOverlapCore(int64_t nccl_comm_ptr, int tp_rank, int tp_size, int num_comm_sm, | ||
| bool is_p2p, bool atomic_gemm) { | ||
| _with_cublasmp = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably check if TE was built with cublasmp and error out otherwise.
|
|
||
| #define NVTE_COMM_OVERLAP_MAX_STREAMS 3 | ||
|
|
||
| /* \brief Check if TE is built with cuBlasMp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cuBLASMp
| void ub_barrier(ExtComm comm); | ||
|
|
||
| int64_t get_nccl_comm_ptr(std::string comm_name) { | ||
| NVTE_CHECK(backend_is_nccl, "Cannot get nccComm_t ptr if backend is not NCCL."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message could be more descriptive - e.g. something like "chosen backend for the communication-computation overlap (cuBLASMp) requires NCCL communicator, but the passed ProcessGroup uses a different backend."
Description
This PR adds support for the NVTE cuBlasMp bindings in the Comm+GEMM overlap API.
Type of change
Checklist: