Skip to content

Documentation Update for NCCL #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

tharittk
Copy link
Contributor

@tharittk tharittk commented May 23, 2025

Ongoing update - in parallel to NCCL implementation PR (#130)

Tasks

  • Update README mentioning the possibility to use NCCL instead of MPI for distributed cupy arrays, updating the install, example and tests sections with NCCL-related commands
  • Update index.rst similar to README to reflect new NCCL engine
  • Update gpu.rst documenting the new env variable (NCCL_PYLOPS_MPI), adding NCCL to the example, and perhaps consider adding a table like in https://pylops.readthedocs.io/en/stable/gpu.html to document what features are supported in NCCL and what are not, eg the missing support for complex numbers (this can also serve as a live roadmap for you work, as we progress we should see more and more features being supported by both MPI and NCCL)

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a few comments on what you have done so far, this needs much more to be ready but hopefully with the checklist that I put in the description of the PR you will have a easier life to navigate the documentation and add bits related to NCCL

=======================================================
To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls (NCCL).
`NCCL <https://developer.nvidia.com/nccl>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we worked with MPI we did not assume that users would be forced to use conda so we also provided some quick instructions on how to setup MPI (see above) in case one would use pip to install mpy4py... I don't know how easy would be to do the same for NCCL, but I am saying this so you can at least understand why this page was written this way.

I would move this later when we have the make install_conda and make dev-install_conda saying that to use NCCL you would instead use the commands make install_conda_nc (and add this to the Makefile) and make dev-install_conda_nc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I saw that there is a nccl package in pip like pip install nvidia-nccl-cu12 and also for CUDA 11 and 13 listed separately. It could be a bit more complicated than conda as conda can automatically detect the version based on the driver $ conda install -c conda-forge cupy nccl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess, the users/developers have to define their own CUDA version anyway because for CuPy, there is also separated package for CUDA 11 and 12 i.e. pip install cupy-cuda11x
 
I can try making the venv and test on pip install cupy-cuda12x & pip install nvidia-nccl-cu12 since I have CUDA Driver version 12. If it works cleanly, I can add to the doc for installing NCCL with Pip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely 😄 I also like to use conda but for users that are not allowed to (as well as for ourselves as in CI it is better to use barebore python with venv), it would be great if we can find a pip only solution too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants