Conversation
3 tasks
dfalbel
added a commit
to mlverse/cuda.ml
that referenced
this pull request
Apr 29, 2026
* Use runs-on GPU runners for CI Replace self-hosted GPU runners with runs-on g4dn.xlarge spot instances, matching the approach in mlverse/torch#1439. Also modernizes the workflow: - Action versions: checkout@v4, setup-python@v5, setup-r@v2, etc. - Fix deprecated ::set-output → $GITHUB_OUTPUT - Container: ubuntu18.04 → ubuntu20.04 (18.04 is EOL) - Add --runtime=nvidia to container options - Add concurrency groups with cancel-in-progress - Simplify matrix to single config (CUDA 11.2.1, cuML 21.12, R release) - Drop ASAN matrix dimension * Revert container to ubuntu18.04 for CUDA 11.2 compatibility * Use CUDA 11.2.2 container (11.2.1 removed from Docker Hub) * Bump container to ubuntu20.04 (18.04 glibc too old for Node 20 actions) * Split CI into build-image (free runner) and test-gpu (GPU runner) - Build Docker image with cuda.ml pre-installed on ubuntu-latest (free) - Run tests on runs-on g4dn.xlarge GPU runner using the pre-built image - Add .github/docker/Dockerfile following the same pattern as mlverse/torch - Make CMAKE_CUDA_ARCHITECTURES configurable via env var (defaults to NATIVE) so cross-compilation works on runners without a GPU (targets T4 = SM 75) - Remove miniconda install (no longer needed for reticulate tests) * Fix sklearn install: use scikit-learn package name and py_require() The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'. Also switch from py_install() to py_require() which is the modern reticulate API for declaring Python dependencies. * Fix configure warnings: normalizePath ordering and cmake unused variable - Move download_libcuml() before normalizePath() so the directory exists - Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches so cmake doesn't warn about unused variable * Fix TSVD tests for SVD sign ambiguity between cuML and sklearn SVD components are only defined up to sign, so different implementations can produce sign-flipped vectors that are mathematically equivalent. Align signs before comparing components and transformed data. * Fix sklearn max_iter type: use integer (10000L) not float (10000.0) Modern sklearn strictly validates that max_iter is an int. R's default numeric type is double, which reticulate passes as a Python float. Using 10000L ensures it's passed as a Python int. * Add CRAN-like check job (no CUDA, stub headers, ubuntu-latest) Runs R CMD check --as-cran on ubuntu-latest with R release and devel. No nvcc/CUDA available, so the package builds with stub headers — matching what CRAN would see. * Update roxygen * export S3 methods * roxygen updates * Fix CRAN check: escape Rd braces, skip tests without cuML - Escape literal braces in roxygen comments across R source files and templates (e.g. {cuda.ml} -> \{cuda.ml\}, {"opt1",...} -> \{"opt1",...\}) - Regenerate all affected Rd files via devtools::document() - Skip test_check() when cuML is not linked (CRAN-like environments) - Use R CMD check directly in CRAN job (avoids rcmdcheck NOT_CRAN=true) * Fix examples brace escaping and register S3 methods - Revert brace escaping inside @examples blocks (R code, not Rd markup) - Define cuda_ml_can_predict_class_probabilities methods as proper functions so roxygen registers them as S3method() in NAMESPACE * Add RAPIDS cuML 26.04 + CUDA 12 support Build infrastructure: - Dockerfile: CUDA 12.8.1 + Ubuntu 22.04 base image - libcuml_versions.R: add 26.04 entry pointing to PyPI libcuml-cu12 wheel - cuml.R: handle pip wheel extraction (lib64/ layout, .whl extension) - configure.R: handle lib64/ vs lib/ for pip wheels - CMakeLists.txt.in: C++17, rapids-cmake branch-26.04 - Workflow: target cuML 26.04 C++ API changes for cuML 26.04: - svm_serde.h: namespace alias MLCommon::Matrix -> ML::matrix for KernelParams and KernelType (header renamed kernelparams.h -> kernel_params.hpp) - fil.cu, fil_utils.h, fil_utils.cu: disable FIL on 26.04 with stubs (fil.h replaced by modular headers; full adaptation TODO) - random_projection.cu: disable on 26.04 with stubs (C++ API removed) - knn.cu: disable on 26.04 with stubs (raft::spatial::knn types removed) - random_forest_classifier.cu, random_forest_regressor.cu: guard FIL prediction paths for 26.04 Backward compatible: cuML 21.x with CUDA 11 still works. * Test both cuML 21.12 and 26.04 in CI - Dockerfile: accept CUDA_IMAGE as build arg for different base images - Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8) - Each version gets its own build-image and test-gpu job * Fix rapids-cmake version and lib symlink for dual cuML support - CMakeLists.txt.in: template RAPIDS_CMAKE_TAG and CMAKE_CXX_STANDARD so they adapt to the cuML version being built against - configure.R: set rapids-cmake tag (v26.04.00 for 26.x, branch-21.10 for 21.x) and C++ standard (17 for 26.x, 14 for 21.x) - cuml.R: don't create premature lib symlink in download_libcuml() * Derive rapids-cmake tag from cuML version instead of hardcoding Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older versions (only alpha tags available). * Require cmake 3.30.4+ for cuML 26.04 (auto-downloaded if missing) rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download logic handles this, but the min version threshold was hardcoded to 3.21.1. Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions. * Fix cuML 26.04 build: raft/rmm deps, static_assert, device_allocator - Download libraft-cu12 and librmm-cu12 wheels alongside libcuml-cu12 (cuml headers include raft/rmm headers which are in separate packages) - Merge raft/rmm headers into libcuml/include/ during download - Remove static_assert(CUML_VERSION_MAJOR == 21) — allow 26+ - Guard raft::mr::device::allocator (removed in raft 26.x) with version conditionals in device_allocator.cu/.h and stream_allocator.cu - Use raft/core/handle.hpp instead of raft/handle.hpp for v26+ * Resolve cuML PyPI deps dynamically instead of hardcoding URLs - Add tools/config/utils/pypi.R with resolve_native_deps() that walks the PyPI dependency tree for a package and returns download URLs for all native C++ dependencies (libraft, librmm, rapids-logger, cccl, etc.) - libcuml_versions.R: cuML 26.04 entry is now just "libcuml-cu12" (the PyPI package name), not a hardcoded URL - cuml.R: download_libcuml() detects PyPI package names vs direct URLs, resolves the full dep tree, downloads all wheels, and merges their include/ directories into libcuml/include/ - configure.R: load pypi.R utility - Uses jsonlite for PyPI JSON API parsing * Download CCCL 3.3 headers for cuML 26.04 builds RMM 26.04 headers require CCCL >= 3.3 at compile time, but CCCL is not a pip dependency (it's normally bundled with the CUDA toolkit). CUDA 12.x ships CCCL 2.x which is too old. Download CCCL v3.3.0 from GitHub releases (header-only, ~2MB) and merge into libcuml/include/. Also handle pip wheels that extract to nested dirs like nvidia/<subpackage>/include/. * Put CUML_INCLUDE_DIR before CUDA toolkit includes CCCL 3.3 headers (bundled in libcuml/include/) must take precedence over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order so cuml/raft/rmm/cccl headers are found first. * Fix CCCL compat, pinned_allocator removal, and raft handle API - Use RAPIDS-pinned CCCL commit (CUDA 12 compatible) instead of v3.3.0 release tag which includes CUDA 13-only code - pinned_host_vector.h: guard thrust::cuda::experimental::pinned_allocator (removed in CCCL 3.x); use plain host_vector on v26+ - handle_utils.cu: raft::handle_t no longer has set_stream(); reconstruct with stream_view via constructor on v26+ * Switch to cuML 25.12 (no CCCL 3.x requirement) cuML 26.04's rmm headers require CCCL >= 3.3 which conflicts with CUDA 12.x toolkit's CCCL 2.x. cuML 25.12 vendors its own CCCL in librmm/include/rapids/ and has no CCCL version check — clean CUDA 12 compatibility. - Target cuML 25.12 instead of 26.04 - Version guards: >= 26 -> >= 25 (same API changes apply) - Re-enable KNN (knn.hpp exists in 25.12 with same API) - Remove CCCL GitHub download (not needed) - Update PyPI resolver to handle version pins (==25.12.*) * Define LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE for RMM RMM headers require this define (normally set automatically by RMM's cmake config, but we're using headers directly from the pip wheel). * Revert cuML 25.x/26.x support (CCCL 3.x incompatible with CUDA 12) All RAPIDS 25.x+ pip wheels require CCCL 3.x headers which are incompatible with CUDA 12's bundled CCCL 2.x. No version of libcuml-cu12 can be compiled against a stock CUDA 12 toolkit. Revert to cuML 21.12 as the default for now. Supporting newer cuML will require either CUDA 13 or a custom build environment. --------- Co-authored-by: Tomasz Kalinowski <kalinowskit@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[self-hosted, gpu-local]runners with runs-ong4dn.xlarge(T4 GPU) instances for thetest-gpuandtest-cudatoolkitjobsubuntu24-gpu-x64image which includes pre-installed NVIDIA drivers and container toolkit--gpus all --runtime=nvidia) is preservedTest plan
test-gpujob runs successfully on runs-on runnertest-cudatoolkitjob runs successfully on runs-on runnernvidia-smi)