[CUDA] fix setting of CUDA architectures and enable support for NVIDIA Blackwell #6812

StrikerRUS · 2025-02-02T13:46:31Z

Refer to dmlc/xgboost#11187 and https://en.wikipedia.org/wiki/CUDA#:~:text=GB10%20(%3F)-,12.0,-GB202%2C%20GB203%2C%20GB205

StrikerRUS · 2025-02-02T13:50:54Z

CMakeLists.txt

@@ -224,19 +224,23 @@ if(USE_CUDA)
    # reference for mapping of CUDA toolkit component versions to supported architectures ("compute capabilities"):
    # https://en.wikipedia.org/wiki/CUDA#GPUs_supported
    set(CUDA_ARCHS "60" "61" "62" "70" "75")
-    if(CUDA_VERSION VERSION_GREATER_EQUAL "110")
+    if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL "11.0")


As we call FindCUDAToolkit but not FindCUDA, we get undefined CUDA_VERSION variable.

LightGBM/CMakeLists.txt

Line 220 in 425395d

find_package(CUDAToolkit 11.0 REQUIRED)

Excellent fix, thank you!

StrikerRUS · 2025-02-02T13:53:05Z

CMakeLists.txt

@@ -224,19 +224,23 @@ if(USE_CUDA)
    # reference for mapping of CUDA toolkit component versions to supported architectures ("compute capabilities"):
    # https://en.wikipedia.org/wiki/CUDA#GPUs_supported
    set(CUDA_ARCHS "60" "61" "62" "70" "75")
-    if(CUDA_VERSION VERSION_GREATER_EQUAL "110")


"110" means exactly 110 version during comparison, VERSION_GREATER_EQUAL doesn't know whether and where we want to put a .: 11.0 or maybe 1.10.

StrikerRUS · 2025-02-02T13:55:27Z

CMakeLists.txt

  set_target_properties(
    lightgbm_objs
    PROPERTIES
-      CUDA_ARCHITECTURES ${CUDA_ARCHS}
+      CUDA_ARCHITECTURES "${CUDA_ARCHS}"


Prevent the following error:

-- Using _mm_malloc CMake Error at CMakeLists.txt:572 (set_target_properties): set_target_properties called with incorrect number of arguments. CMake Error at CMakeLists.txt:579 (set_target_properties): set_target_properties called with incorrect number of arguments.

CUDA_ARCHS were passed in the following form: 60616270758086878990100120+PTX.

StrikerRUS · 2025-02-02T14:32:36Z

CMakeLists.txt

+    list(TRANSFORM CUDA_ARCHS APPEND "-real")
+    list(APPEND CUDA_ARCHS "${CUDA_LAST_SUPPORTED_ARCH}-real" "${CUDA_LAST_SUPPORTED_ARCH}-virtual")


Fix the following error:

[33/70] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o FAILED: CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DEIGEN_DONT_PARALLELIZE -DEIGEN_MPL2_ONLY -DMM_MALLOC -DMM_PREFETCH -DUSE_CUDA -DUSE_SOCKET -I/__w/LightGBM/LightGBM/lightgbm-python/external_libs/eigen -I/__w/LightGBM/LightGBM/lightgbm-python/external_libs/fast_double_parser/include -I/__w/LightGBM/LightGBM/lightgbm-python/external_libs/fmt/include -I/usr/local/cuda/targets/x86_64-linux/include -I/__w/LightGBM/LightGBM/lightgbm-python/include -Xcompiler=-fopenmp -Xcompiler=-fPIC -Xcompiler=-Wall -O3 -lineinfo -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_60,code=[compute_60,sm_60]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_62,code=[compute_62,sm_62]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_75,code=[compute_75,sm_75]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_87,code=[compute_87,sm_87]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_100,code=[compute_100,sm_100]" "--generate-code=arch=compute_120+PTX,code=[compute_120+PTX,sm_120+PTX]" -MD -MT CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o -MF CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o.d -x cu -rdc=true -c /__w/LightGBM/LightGBM/lightgbm-python/src/boosting/cuda/cuda_score_updater.cu -o CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o nvcc fatal : Unsupported gpu architecture 'compute_120+PTX'

Borrowed from XGBoost:
https://github.com/dmlc/xgboost/blob/a46585a36c4bf30bfd58a2653fe8ae40beea25ce/cmake/Utils.cmake#L73-L74

https://github.com/dmlc/xgboost/actions/runs/13056819151/job/36429848864#step:6:162

StrikerRUS · 2025-02-02T14:50:13Z

.ci/check-python-dists.sh

@@ -32,7 +32,7 @@ if [ "$PY_MINOR_VER" -gt 7 ]; then
            --inspect \
            --ignore 'compiled-objects-have-debug-symbols'\
            --ignore 'distro-too-large-compressed' \
-            --max-allowed-size-uncompressed '100M' \
+            --max-allowed-size-uncompressed '120M' \


To avoid

------------ check results ----------- 1. [distro-too-large-uncompressed] Uncompressed size 0.1G is larger than the allowed size (100.0M). errors found while checking: 1

jameslamb

This looks great, thank you so much!!

jameslamb · 2025-02-02T18:56:02Z

I'm really happy we were able to get Blackwell support into the next release 😁

StrikerRUS added 15 commits January 31, 2025 03:35

Update CMakeLists.txt

2df25bd

Update cuda.yml

f569fac

Update cuda.yml

6918b18

Update CMakeLists.txt

4218424

Update cuda.yml

cca1a03

Update CMakeLists.txt

8648091

Update CMakeLists.txt

f44d835

Update CMakeLists.txt

2fc0a4f

Update CMakeLists.txt

c32e639

Update CMakeLists.txt

9a9c3cb

Update CMakeLists.txt

e6751d9

Update CMakeLists.txt

d9b8c26

Update CMakeLists.txt

b1a9d31

Update CMakeLists.txt

903d9e0

Update cuda.yml

a026971

StrikerRUS added awaiting review feature labels Feb 2, 2025

StrikerRUS commented Feb 2, 2025

View reviewed changes

StrikerRUS added 2 commits February 2, 2025 17:13

Update CMakeLists.txt

e9bf0c6

Update CMakeLists.txt

cc09dc0

StrikerRUS commented Feb 2, 2025

View reviewed changes

Update check-python-dists.sh

be70aa5

StrikerRUS commented Feb 2, 2025

View reviewed changes

StrikerRUS marked this pull request as ready for review February 2, 2025 15:55

StrikerRUS requested review from guolinke, jameslamb, shiyu1994, jmoralez and borchero as code owners February 2, 2025 15:55

jameslamb approved these changes Feb 2, 2025

View reviewed changes

jameslamb removed the awaiting review label Feb 2, 2025

jameslamb merged commit c9de57b into master Feb 2, 2025
49 checks passed

jameslamb deleted the ci/cuda branch February 2, 2025 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] fix setting of CUDA architectures and enable support for NVIDIA Blackwell #6812

[CUDA] fix setting of CUDA architectures and enable support for NVIDIA Blackwell #6812

StrikerRUS commented Feb 2, 2025

StrikerRUS Feb 2, 2025

jameslamb Feb 2, 2025

StrikerRUS Feb 2, 2025 •

edited

Loading

StrikerRUS Feb 2, 2025

StrikerRUS Feb 2, 2025

StrikerRUS Feb 2, 2025

jameslamb left a comment

jameslamb commented Feb 2, 2025

		list(TRANSFORM CUDA_ARCHS APPEND "-real")
		list(APPEND CUDA_ARCHS "${CUDA_LAST_SUPPORTED_ARCH}-real" "${CUDA_LAST_SUPPORTED_ARCH}-virtual")

[CUDA] fix setting of CUDA architectures and enable support for NVIDIA Blackwell #6812

[CUDA] fix setting of CUDA architectures and enable support for NVIDIA Blackwell #6812

Conversation

StrikerRUS commented Feb 2, 2025

StrikerRUS Feb 2, 2025

Choose a reason for hiding this comment

jameslamb Feb 2, 2025

Choose a reason for hiding this comment

StrikerRUS Feb 2, 2025 • edited Loading

Choose a reason for hiding this comment

StrikerRUS Feb 2, 2025

Choose a reason for hiding this comment

StrikerRUS Feb 2, 2025

Choose a reason for hiding this comment

StrikerRUS Feb 2, 2025

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Feb 2, 2025

StrikerRUS Feb 2, 2025 •

edited

Loading