-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest NCCL 2.24.3 might crash XGBoost. #11154
Comments
This should not affect conda build. |
How do things look with NCCL 2.25.1-1? |
@jakirkham Just tried 2.25.1-1 (as part of #11202). I get the same error. I had to set the env var |
Thanks Hyunsu! 🙏 This is with conda, pip, or both? |
@jakirkham The issue only arises if NCCL was installed from pip. The issue does not arise if:
So this issue won't arise for the Conda package of XGBoost. |
Need to remove CI workarounds once the new nccl is released. |
Workaround:
export NCCL_RAS_ENABLE=0
xgboost/ops/pipeline/test-python-wheel-impl.sh
Line 48 in 3a2a85d
xgboost/ops/docker_run.py
Line 73 in 461d27c
The text was updated successfully, but these errors were encountered: