Skip to content

Commit 34ef825

Browse files
authored
Merge path_finder_dev into main (#613)
* First version of `cuda.bindings.path_finder` (#447) * Unmodified copies of: * https://github.com/NVIDIA/numba-cuda/blob/bf487d78a40eea87f009d636882a5000a7524c95/numba_cuda/numba/cuda/cuda_paths.py * https://github.com/numba/numba/blob/f0d24824fcd6a454827e3c108882395d00befc04/numba/misc/findlib.py * Add Forked from URLs. * Strip down cuda_paths.py to minimum required for `_get_nvvm_path()` Tested interactively with: ``` import cuda_paths nvvm_path = cuda_paths._get_nvvm_path() print(f"{nvvm_path=}") ``` * ruff auto-fixes (NO manual changes) * Make `get_nvvm_path()` a pubic API (i.e. remove leading underscore). * Fetch numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from NVIDIA/numba-cuda#155 AS-IS * ruff format NO MANUAL CHANGES * Minimal changes to adapt numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from NVIDIA/numba-cuda#155 * Rename ecosystem/cuda_paths.py -> path_finder.py * Plug cuda.bindings.path_finder into cuda/bindings/_internal/nvvm_linux.pyx * Plug cuda.bindings.path_finder into cuda/bindings/_internal/nvjitlink_linux.pyx * Fix `os.path.exists(None)` issue: ``` ______________________ ERROR collecting test_nvjitlink.py ______________________ tests/test_nvjitlink.py:62: in <module> not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests/test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda/bindings/_internal/nvjitlink.pyx:257: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:260: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:208: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda/bindings/_internal/nvjitlink.pyx:102: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda/bindings/_internal/nvjitlink.pyx:59: in cuda.bindings._internal.nvjitlink.load_library ??? /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:312: in get_cuda_paths "nvvm": _get_nvvm_path(), /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:285: in _get_nvvm_path by, path = _get_nvvm_path_decision() /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:96: in _get_nvvm_path_decision if os.path.exists(nvvm_ctk_dir): <frozen genericpath>:19: in exists ??? E TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` * Fix another `os.path.exists(None)` issue: ``` ______________________ ERROR collecting test_nvjitlink.py ______________________ tests/test_nvjitlink.py:62: in <module> not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests/test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda/bindings/_internal/nvjitlink.pyx:257: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:260: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:208: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda/bindings/_internal/nvjitlink.pyx:102: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda/bindings/_internal/nvjitlink.pyx:59: in cuda.bindings._internal.nvjitlink.load_library ??? /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:313: in get_cuda_paths "libdevice": _get_libdevice_paths(), /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:126: in _get_libdevice_paths by, libdir = _get_libdevice_path_decision() /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:73: in _get_libdevice_path_decision if os.path.exists(libdevice_ctk_dir): <frozen genericpath>:19: in exists ??? E TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` * Change "/lib64/" → "/lib/" in nvjitlink_linux.pyx * nvjitlink_linux.pyx load_library() enhancements, mainly to avoid os.path.join(None, "libnvJitLink.so") * Add missing f-string f * Add back get_nvjitlink_dso_version_suffix() call. * pytest -ra -s -v * Rewrite nvjitlink_linux.pyx load_library() to produce detailed error messages. * Attach listdir output to "Unable to load" exception message. * Guard os.listdir() call with os.path.isdir() * Fix logic error in nvjitlink_linux.pyx load_library() * Move path_finder.py to _path_finder_utils/cuda_paths.py, import only public functions from new path_finder.py * Add find_nvidia_dynamic_library() and use from nvjitlink_linux.pyx, nvvm_linux.pyx * Fix oversight in _find_using_lib_dir() * Also look for versioned library in _find_using_nvidia_lib_dirs() * glob.glob() Python 3.9 compatibility * Reduce build-and-test.yml to Windows-only, Python 3.12 only. * Comment out `if: ${{ github.repository_owner == nvidia }}` * Revert "Comment out `if: ${{ github.repository_owner == nvidia }}`" This reverts commit b0db24f. * Add back `linux-64` `host-platform` * Rewrite load_library() in nvjitlink_windows.pyx to use path_finder.find_nvidia_dynamic_library() * Revert "Rewrite load_library() in nvjitlink_windows.pyx to use path_finder.find_nvidia_dynamic_library()" This reverts commit 1bb7151. * Add _inspect_environment() in find_nvidia_dynamic_library.py, call from nvjitlink_windows.pyx, nvvm_windows.pyx * Add & use _find_dll_using_nvidia_bin_dirs(), _find_dll_using_cudalib_dir() * Fix silly oversight: forgot to undo experimental change. * Also reduce test test-linux matrix. * Reimplement load_library() functions in nvjitlink_windows.pyx, nvvm_windows.pyx to actively use path_finder.find_nvidia_dynamic_library() * Factor out load_nvidia_dynamic_library() from _internal/nvjitlink_linux.pyx, nvvm_linux.pyx * Generalize load_nvidia_dynamic_library.py to also work under Windows. * Add `void*` return type to load_library() implementations in _internal/nvjitlink_windows.pyx, nvvm_windows.pyx * Resolve cython error: object handle vs `void*` handle ``` Error compiling Cython file: ------------------------------------------------------------ ... err = (<int (*)(int*) nogil>__cuDriverGetVersion)(&driver_ver) if err != 0: raise RuntimeError('something went wrong') # Load library handle = load_library(driver_ver) ^ ------------------------------------------------------------ cuda\bindings\_internal\nvjitlink.pyx:72:29: Cannot convert 'void *' to Python object ``` * Resolve another cython error: `void*` handle vs `intptr_t` handle ``` Error compiling Cython file: ------------------------------------------------------------ ... handle = load_library(driver_ver) # Load function global __nvJitLinkCreate try: __nvJitLinkCreate = <void*><intptr_t>win32api.GetProcAddress(handle, 'nvJitLinkCreate') ^ ------------------------------------------------------------ cuda\bindings\_internal\nvjitlink.pyx:78:73: Cannot convert 'void *' to Python object ``` * Resolve signed/unsigned runtime error. Use uintptr_t consistently. https://github.com/NVIDIA/cuda-python/actions/runs/14224673173/job/39861750852?pr=447#logs ``` =================================== ERRORS ==================================== _____________________ ERROR collecting test_nvjitlink.py ______________________ tests\test_nvjitlink.py:62: in <module> not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests\test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda\\bindings\\_internal\\nvjitlink.pyx:221: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda\\bindings\\_internal\\nvjitlink.pyx:224: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda\\bindings\\_internal\\nvjitlink.pyx:172: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda\\bindings\\_internal\\nvjitlink.pyx:73: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda\\bindings\\_internal\\nvjitlink.pyx:46: in cuda.bindings._internal.nvjitlink.load_library ??? E OverflowError: can't convert negative value to size_t ``` * Change <void*><uintptr_t>win32api.GetProcAddress` back to `intptr_t`. Changing load_nvidia_dynamic_library() to also use to-`intptr_t` conversion, for compatibility with win32api.GetProcAddress. Document that CDLL behaves differently (it uses to-`uintptr_t`). * Use win32api.LoadLibrary() instead of ctypes.windll.kernel32.LoadLibraryW(), to be more similar to original (and working) cython code. Hoping to resolve this kind of error: ``` _ ERROR at setup of test_c_or_v_program_fail_bad_option[txt-compile_program] __ request = <SubRequest 'minimal_nvvmir' for <Function test_c_or_v_program_fail_bad_option[txt-compile_program]>> @pytest.fixture(params=MINIMAL_NVVMIR_FIXTURE_PARAMS) def minimal_nvvmir(request): for pass_counter in range(2): nvvmir = MINIMAL_NVVMIR_CACHE.get(request.param, -1) if nvvmir != -1: if nvvmir is None: pytest.skip(f"UNAVAILABLE: {request.param}") return nvvmir if pass_counter: raise AssertionError("This code path is meant to be unreachable.") # Build cache entries, then try again (above). > major, minor, debug_major, debug_minor = nvvm.ir_version() tests\test_nvvm.py:148: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cuda\bindings\nvvm.pyx:95: in cuda.bindings.nvvm.ir_version cpdef tuple ir_version(): cuda\bindings\nvvm.pyx:113: in cuda.bindings.nvvm.ir_version status = nvvmIRVersion(&major_ir, &minor_ir, &major_dbg, &minor_dbg) cuda\bindings\cynvvm.pyx:19: in cuda.bindings.cynvvm.nvvmIRVersion return _nvvm._nvvmIRVersion(majorIR, minorIR, majorDbg, minorDbg) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E cuda.bindings._internal.utils.FunctionNotFoundError: function nvvmIRVersion is not found ``` * Remove debug print statements. * Remove some cruft. * Trivial renaming of variables. No functional changes. * Revert debug changes under .github/workflows * Rename _path_finder_utils → _path_finder * Remove LD_LIBRARY_PATH in fetch_ctk/action.yml * Linux: First try using the platform-specific dynamic loader search mechanisms * Add _windows_load_with_dll_basename() * Revert "Revert debug changes under .github/workflows" This reverts commit cc6113c. * Add debug prints in load_nvidia_dynamic_library() * Report dlopen error for libnvrtc.so.12 * print("\nLOOOK dlfcn.dlopen('libnvrtc.so.12', dlfcn.RTLD_NOW)", flush=True) * Revert "Remove LD_LIBRARY_PATH in fetch_ctk/action.yml" This reverts commit 1b1139c. * Only remove ${CUDA_PATH}/nvvm/lib64 from LD_LIBRARY_PATH * Use path_finder.load_nvidia_dynamic_library("nvrtc") from cuda/bindings/_bindings/cynvrtc.pyx.in * Somewhat ad hoc heuristics for nvidia_cuda_nvrtc wheels. * Remove LD_LIBRARY_PATH entirely from .github/actions/fetch_ctk/action.yml * Remove CUDA_PATH\nvvm\bin in .github/workflows/test-wheel-windows.yml * Revert "Remove LD_LIBRARY_PATH entirely from .github/actions/fetch_ctk/action.yml" This reverts commit bff8cf0. * Revert "Somewhat ad hoc heuristics for nvidia_cuda_nvrtc wheels." This reverts commit 43abec8. * Restore cuda/bindings/_bindings/cynvrtc.pyx.in as-is on main * Remove debug print from load_nvidia_dynamic_library.py * Reapply "Revert debug changes under .github/workflows" This reverts commit aaa6aff. * Make `path_finder` work for `"nvrtc"` (#553) * Revert "Restore cuda/bindings/_bindings/cynvrtc.pyx.in as-is on main" This reverts commit ba093f5. * Revert "Reapply "Revert debug changes under .github/workflows"" This reverts commit 8f69f83. * Also load nvrtc from cuda_bindings/tests/path_finder.py * Add heuristics for nvidia_cuda_nvrtc Windows wheels. Also fix a couple bugs discovered by ChatGPT: * `glob.glob()` in this code return absolute paths. * stray `error_messages = []` * Add debug prints, mostly for `os.add_dll_directory(bin_dir)` * Fix unfortunate silly oversight (import os missing under Windows) * Use `win32api.LoadLibraryEx()` with suitable `flags`; also update `os.environ["PATH"]` * Hard-wire WinBase.h constants (they are not exposed by win32con) * Remove debug prints * Reapply "Reapply "Revert debug changes under .github/workflows"" This reverts commit b002ff6. * Add `path_finder.SUPPORTED_LIBNAMES` (#558) * Revert "Reapply "Revert debug changes under .github/workflows"" This reverts commit 8f69f83. * Add names of all CTK 12.8.1 x86_64-linux libraries (.so) as `path_finder.SUPPORTED_LIBNAMES` https://chatgpt.com/share/67f98d0b-148c-8008-9951-9995cf5d860c * Add `SUPPORTED_WINDOWS_DLLS` * Add copyright notice * Move SUPPORTED_LIBNAMES, SUPPORTED_WINDOWS_DLLS to _path_finder/supported_libs.py * Use SUPPORTED_WINDOWS_DLLS in _windows_load_with_dll_basename() * Change "Set up mini CTK" to use `method: local`, remove `sub-packages` line. * Use Jimver/[email protected] also under Linux, `method: local`, no `sub-packages`. * Add more `nvidia-*-cu12` wheels to get as many of the supported shared libraries as possible. * Revert "Use Jimver/[email protected] also under Linux, `method: local`, no `sub-packages`." This reverts commit d499806. Problem observed: ``` /usr/bin/docker exec 1b42cd4ea3149ac3f2448eae830190ee62289b7304a73f8001e90cead5005102 sh -c "cat /etc/*release | grep ^ID" Warning: Failed to restore: Cache service responded with 422 /usr/bin/tar --posix -cf cache.tgz --exclude cache.tgz -P -C /__w/cuda-python/cuda-python --files-from manifest.txt -z Failed to save: Unable to reserve cache with key cuda_installer-linux-5.15.0-135-generic-x64-12.8.0, another job may be creating this cache. More details: This legacy service is shutting down, effective April 15, 2025. Migrate to the new service ASAP. For more information: https://gh.io/gha-cache-sunset Warning: Error during installation: Error: Unable to locate executable file: sudo. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable. Error: Error: Unable to locate executable file: sudo. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable. ``` * Change test_path_finder::test_find_and_load() to skip cufile on Windows, and report exceptions as failures, except for cudart * Add nvidia-cuda-runtime-cu12 to pyproject.toml (for libname cudart) * test_path_finder.py: before loading cusolver, load nvJitLink, cusparse, cublas (experiment to see if that resolves the only Windows failure) Test (win-64, Python 3.12, CUDA 12.8.0, Runner default, CTK wheels) / test ``` ================================== FAILURES =================================== ________________________ test_find_and_load[cusolver] _________________________ libname = 'cusolver' @pytest.mark.parametrize("libname", path_finder.SUPPORTED_LIBNAMES) def test_find_and_load(libname): if sys.platform == "win32" and libname == "cufile": pytest.skip(f'test_find_and_load("{libname}") not supported on this platform') print(f'\ntest_find_and_load("{libname}")') failures = [] for algo, func in ( ("find", path_finder.find_nvidia_dynamic_library), ("load", path_finder.load_nvidia_dynamic_library), ): try: out = func(libname) except Exception as e: out = f"EXCEPTION: {type(e)} {str(e)}" failures.append(algo) print(out) print() > assert not failures E AssertionError: assert not ['load'] tests\test_path_finder.py:29: AssertionError ``` * test_path_finder.py: load *only* nvJitLink before loading cusolver * Run each test_find_or_load_nvidia_dynamic_library() subtest in a subprocess * Add cublasLt to supported_libs.py and load deps for cusolver, cusolverMg, cusparse in test_path_finder.py. Also restrict test_path_finder.py to test load only for now. * Add supported_libs.DIRECT_DEPENDENCIES * Remove cufile_rdma from supported libs (comment out). https://chatgpt.com/share/68033a33-385c-8008-a293-4c8cc3ea23ae * Split out `PARTIALLY_SUPPORTED_LIBNAMES`. Fix up test code. * Reduce public API to only load_nvidia_dynamic_library, SUPPORTED_LIBNAMES * Set CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1 to match expected availability of nvidia shared libraries. * Refactor as `class _find_nvidia_dynamic_library` * Strict wheel, conda, system rule: try using the platform-specific dynamic loader search mechanisms only last * Introduce _load_and_report_path_linux(), add supported_libs.EXPECTED_LIB_SYMBOLS * Plug in ctypes.windll.kernel32.GetModuleFileNameW() * Keep track of nvrtc-related GitHub comment * Factor out `_find_dll_under_dir(dirpath, file_wild)` and reuse from `_find_dll_using_nvidia_bin_dirs()`, `_find_dll_using_cudalib_dir()` (to fix loading nvrtc64_120_0.dll from local CTK) * Minimal "is already loaded" code. * Add THIS FILE NEEDS TO BE REVIEWED/UPDATED FOR EACH CTK RELEASE comment in _path_finder/supported_libs.py * Add SUPPORTED_LINUX_SONAMES in _path_finder/supported_libs.py * Update SUPPORTED_WINDOWS_DLLS in _path_finder/supported_libs.py based on DLLs found in cuda_*win*.exe files. * Remove `os.add_dll_directory()` and `os.environ["PATH"]` manipulations from find_nvidia_dynamic_library.py. Add `supported_libs.LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY` and use from `load_nvidia_dynamic_library()`. * Move nvrtc-specific code from find_nvidia_dynamic_library.py to `supported_libs.is_suppressed_dll_file()` * Introduce dataclass LoadedDL as return type for load_nvidia_dynamic_library() * Factor out _abs_path_for_dynamic_library_* and use on handle obtained through "is already loaded" checks * Factor out _load_nvidia_dynamic_library_no_cache() and use for exercising LoadedDL.was_already_loaded_from_elsewhere * _check_nvjitlink_usable() in test_path_finder.py * Undo changes in .github/workflows/ and cuda_bindings/pyproject.toml * Move cuda_bindings/tests/path_finder.py -> toolshed/run_cuda_bindings_path_finder.py * Add bandit suppressions in test_path_finder.py * Add pytest info_summary_append fixture and use from test_path_finder.py to report the absolute paths of the loaded libraries. * Fix tiny accident: a line in pyproject.toml got lost somehow. * Undo changes under .github (LD_LIBRARY_PATH, PATH manipulations for nvvm). * 2025-05-01 version of `cuda.bindings.path_finder` (#578) * Undo changes to the nvJitLink, nvrtc, nvvm bindings * Undo changes under .github, specific to nvvm, manipulating LD_LIBRARY_PATH or PATH * PARTIALLY_SUPPORTED_LIBNAMES_LINUX, PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS * Update EXPECTED_LIB_SYMBOLS for nvJitLink to cleanly support CTK versions 12.0, 12.1, 12.2 * Save result of factoring out load_dl_common.py, load_dl_linux.py, load_dl_windows.py with the help of Cursor. * Fix an auto-generated docstring * first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes * Revert "first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes" This reverts commit 001a6a2. There were many GitHub Actions jobs that failed (all tests with 12.x): https://github.com/NVIDIA/cuda-python/actions/runs/14677553387 This is not worth spending time debugging. Especially because * Cursor has been unresponsive for at least half an hour: We're having trouble connecting to the model provider. This might be temporary - please try again in a moment. * The refactored code does not seem easier to read. * A couple trivial tweaks * Prefix the public API (just two items) with underscores for now. * Add SPDX-License-Identifier to all files under toolshed/ that don't have it already * Add SPDX-License-Identifier under cuda_bindings/tests/ * Respond to "Do these need to be run as subprocesses?" review question (#578 (comment)) * Respond to "dead code?" review questions (e.g. #578 (comment)) * Respond to "Do we need to implement a cache separately ..." review question (#578 (comment)) * Remove cuDriverGetVersion() function for now. * Move add_dll_directory() from load_dl_common.py to load_dl_windows.py (response to review question #578 (comment)) * Add SPDX-License-Identifier and # Forked from: URL in cuda_paths.py * Add Add SPDX-License-Identifier and Original LICENSE in findlib.py * Very first draft of README.md * Update README.md, mostly as revised by perplexity, with various manual edits. * Refork cuda_paths.py AS-IS: https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py * ruff format cuda_paths.py (NO manual changes) * Add back _get_numba_CUDA_INCLUDE_PATH from 2279bda (i.e. cuda_paths.py as it was right before re-forking) * Remove cuda_paths.py dependency on numba.cuda.cudadrv.runtime * Add Forked from URLs, two SPDX-License-Identifier, Original Numba LICENSE * Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage * Restore cuda_path.py AS-IT-WAS at commit 2279bda * Revert "Restore cuda_path.py AS-IT-WAS at commit 2279bda" This reverts commit 1b88ec2. * Force compute-sanitizer off unconditionally * Revert "Force compute-sanitizer off unconditionally" This reverts commit 2bc7ef6. * Add timeout=10 seconds to test_path_finder.py subprocess.run() invocations. * Increase test_path_finder.py subprocess.run() timeout to 30 seconds: Under Windows, loading cublas or cusolver may exceed the 10 second timeout: #578 (comment) * Revert "Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage" This reverts commit 47ad79f. * Force compute-sanitizer off unconditionally * Add: Note that the search is done on a per-library basis. * Add Note for CUDA_HOME / CUDA_PATH * Add 0. **Check if a library was loaded into the process already by some other means.** * _find_dll_using_nvidia_bin_dirs(): reuse lib_searched_for in place of file_wild * Systematically replace all relative imports with absolute imports. * handle: int → ctypes.CDLL fix * Make load_dl_windows.py abs_path_for_dynamic_library() implementation maximally robust. * Change argument name → libname for self-consistency * Systematically replace previously overlooked relative imports with absolute imports. * Simplify code (also for self-consistency) * Expand the 3. **System Installations** section with information produced by perplexity * Pull out `**Environment variables**` into an added section, after manual inspection of cuda_paths.py. Minor additional edits. * Revert "Force compute-sanitizer off unconditionally" This reverts commit aeaf4f0. * Move _path_finder/sys_path_find_sub_dirs.py → find_sub_dirs.py, use find_sub_dirs_all_sitepackages() from find_nvidia_dynamic_library.py * WIP (search priority updated in README.md but not in code) * Revert "WIP (search priority updated in README.md but not in code)" This reverts commit bf9734c. * WIP (search priority updated in README.md but not in code) * Completely replace cuda_paths.py to achieve the desired Search Priority (see updated README.md). * Define `IS_WINDOWS = sys.platform == "win32"` in supported_libs.py * Use os.path.samefile() to resolve issues with doubled backslashes. * `load_in_subprocess(): Pass current environment * Add run_python_code_safely.py as generated by perplexity, plus ruff format, bandit nosec * Replace subprocess.run with run_python_code_safely * Factor out `class Worker` to fix pickle issue. * ChatGPT revisions based on Deep research: https://chatgpt.com/share/681914ce-f274-8008-9e9f-4538716b4ed7 * Fix race condition in result queue handling by using timeout-based get() The previous implementation checked result_queue.empty() before calling get(), which introduces a classic race condition: the queue may become non-empty immediately after the check, resulting in missed results or misleading errors. This patch replaces the empty() check with result_queue.get(timeout=1.0), allowing the parent process to robustly wait for results with a bounded delay. Also switches from ctx.SimpleQueue() to ctx.Queue() for compatibility with timeout-based get(), which SimpleQueue does not support on Python ≤3.12. Note: The race condition was discovered by Gemini 2.5 * Resolve SIM108 * Change to "nppc" as ANCHOR_LIBNAME * Implement CUDA_PYTHON_CUDA_HOME_PRIORITY first, last, with default first * Remove retry_with_anchor_abs_path() and make retry_with_cuda_home_priority_last() the default. * Update README.md to reflect new search priority * SUPPORTED_LINUX_SONAMES does not need updates for CTK 12.9.0 * The only addition to SUPPORTED_WINDOWS_DLLS for CTK 12.9.0 is nvvm70.dll * Make OSError in load_dl_windows.py abs_path_for_dynamic_library() more informative. * run_cuda_bindings_path_finder.py: optionally use args as libnames (to aid debugging) * Bug fix in load_dl_windows.py: ctypes.windll.kernel32.LoadLibraryW() returns an incompatible `handle`. Use win32api.LoadLibraryEx() instead to ensure self-consistency. * Remove _find_nvidia_dynamic_library.retry_with_anchor_abs_path() method. Move run_python_code_safely.py to test/ directory. * Add missing SPDX-License-Identifier
1 parent c21613b commit 34ef825

17 files changed

+1544
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# `cuda.bindings.path_finder` Module
2+
3+
## Public API (Work in Progress)
4+
5+
Currently exposes two primary interfaces:
6+
7+
```
8+
cuda.bindings.path_finder._SUPPORTED_LIBNAMES # ('nvJitLink', 'nvrtc', 'nvvm')
9+
cuda.bindings.path_finder._load_nvidia_dynamic_library(libname: str) -> LoadedDL
10+
```
11+
12+
**Note:**
13+
These APIs are prefixed with an underscore because they are considered
14+
experimental while undergoing active development, although already
15+
reasonably well-tested through CI pipelines.
16+
17+
## Library Loading Search Priority
18+
19+
The `load_nvidia_dynamic_library()` function implements a hierarchical search
20+
strategy for locating NVIDIA shared libraries:
21+
22+
0. **Check if a library was loaded into the process already by some other means.**
23+
- If yes, there is no alternative to skipping the rest of the search logic.
24+
The absolute path of the already loaded library will be returned, along
25+
with the handle to the library.
26+
27+
1. **NVIDIA Python wheels**
28+
- Scans all site-packages to find libraries installed via NVIDIA Python wheels.
29+
30+
2. **OS default mechanisms / Conda environments**
31+
- Falls back to native loader:
32+
- `dlopen()` on Linux
33+
- `LoadLibraryW()` on Windows
34+
- CTK installations with system config updates are expected to be discovered:
35+
- Linux: Via `/etc/ld.so.conf.d/*cuda*.conf`
36+
- Windows: Via `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin` on system `PATH`
37+
- Conda installations are expected to be discovered:
38+
- Linux: Via `$ORIGIN/../lib` on `RPATH` (of the `python` binary)
39+
- Windows: Via `%CONDA_PREFIX%\Library\bin` on system `PATH`
40+
41+
3. **Environment variables**
42+
- Relies on `CUDA_HOME` or `CUDA_PATH` environment variables if set
43+
(in that order).
44+
45+
Note that the search is done on a per-library basis. There is no centralized
46+
mechanism that ensures all libraries are found in the same way.
47+
48+
## Maintenance Requirements
49+
50+
These key components must be updated for new CUDA Toolkit releases:
51+
52+
- `supported_libs.SUPPORTED_LIBNAMES`
53+
- `supported_libs.SUPPORTED_WINDOWS_DLLS`
54+
- `supported_libs.SUPPORTED_LINUX_SONAMES`
55+
- `supported_libs.EXPECTED_LIB_SYMBOLS`
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Copyright 2024-2025 NVIDIA Corporation. All rights reserved.
2+
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
import functools
5+
import glob
6+
import os
7+
8+
from cuda.bindings._path_finder.find_sub_dirs import find_sub_dirs_all_sitepackages
9+
from cuda.bindings._path_finder.supported_libs import IS_WINDOWS, is_suppressed_dll_file
10+
11+
12+
def _no_such_file_in_sub_dirs(sub_dirs, file_wild, error_messages, attachments):
13+
error_messages.append(f"No such file: {file_wild}")
14+
for sub_dir in find_sub_dirs_all_sitepackages(sub_dirs):
15+
attachments.append(f' listdir("{sub_dir}"):')
16+
for node in sorted(os.listdir(sub_dir)):
17+
attachments.append(f" {node}")
18+
19+
20+
def _find_so_using_nvidia_lib_dirs(libname, so_basename, error_messages, attachments):
21+
nvidia_sub_dirs = ("nvidia", "*", "nvvm", "lib64") if libname == "nvvm" else ("nvidia", "*", "lib")
22+
file_wild = so_basename + "*"
23+
for lib_dir in find_sub_dirs_all_sitepackages(nvidia_sub_dirs):
24+
# First look for an exact match
25+
so_name = os.path.join(lib_dir, so_basename)
26+
if os.path.isfile(so_name):
27+
return so_name
28+
# Look for a versioned library
29+
# Using sort here mainly to make the result deterministic.
30+
for so_name in sorted(glob.glob(os.path.join(lib_dir, file_wild))):
31+
if os.path.isfile(so_name):
32+
return so_name
33+
_no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments)
34+
return None
35+
36+
37+
def _find_dll_under_dir(dirpath, file_wild):
38+
for path in sorted(glob.glob(os.path.join(dirpath, file_wild))):
39+
if not os.path.isfile(path):
40+
continue
41+
if not is_suppressed_dll_file(os.path.basename(path)):
42+
return path
43+
return None
44+
45+
46+
def _find_dll_using_nvidia_bin_dirs(libname, lib_searched_for, error_messages, attachments):
47+
nvidia_sub_dirs = ("nvidia", "*", "nvvm", "bin") if libname == "nvvm" else ("nvidia", "*", "bin")
48+
for bin_dir in find_sub_dirs_all_sitepackages(nvidia_sub_dirs):
49+
dll_name = _find_dll_under_dir(bin_dir, lib_searched_for)
50+
if dll_name is not None:
51+
return dll_name
52+
_no_such_file_in_sub_dirs(nvidia_sub_dirs, lib_searched_for, error_messages, attachments)
53+
return None
54+
55+
56+
def _get_cuda_home():
57+
cuda_home = os.environ.get("CUDA_HOME")
58+
if cuda_home is None:
59+
cuda_home = os.environ.get("CUDA_PATH")
60+
return cuda_home
61+
62+
63+
def _find_lib_dir_using_cuda_home(libname):
64+
cuda_home = _get_cuda_home()
65+
if cuda_home is None:
66+
return None
67+
if IS_WINDOWS:
68+
subdirs = (os.path.join("nvvm", "bin"),) if libname == "nvvm" else ("bin",)
69+
else:
70+
subdirs = (
71+
(os.path.join("nvvm", "lib64"),)
72+
if libname == "nvvm"
73+
else (
74+
"lib64", # CTK
75+
"lib", # Conda
76+
)
77+
)
78+
for subdir in subdirs:
79+
dirname = os.path.join(cuda_home, subdir)
80+
if os.path.isdir(dirname):
81+
return dirname
82+
return None
83+
84+
85+
def _find_so_using_lib_dir(lib_dir, so_basename, error_messages, attachments):
86+
so_name = os.path.join(lib_dir, so_basename)
87+
if os.path.isfile(so_name):
88+
return so_name
89+
error_messages.append(f"No such file: {so_name}")
90+
attachments.append(f' listdir("{lib_dir}"):')
91+
if not os.path.isdir(lib_dir):
92+
attachments.append(" DIRECTORY DOES NOT EXIST")
93+
else:
94+
for node in sorted(os.listdir(lib_dir)):
95+
attachments.append(f" {node}")
96+
return None
97+
98+
99+
def _find_dll_using_lib_dir(lib_dir, libname, error_messages, attachments):
100+
file_wild = libname + "*.dll"
101+
dll_name = _find_dll_under_dir(lib_dir, file_wild)
102+
if dll_name is not None:
103+
return dll_name
104+
error_messages.append(f"No such file: {file_wild}")
105+
attachments.append(f' listdir("{lib_dir}"):')
106+
for node in sorted(os.listdir(lib_dir)):
107+
attachments.append(f" {node}")
108+
return None
109+
110+
111+
class _find_nvidia_dynamic_library:
112+
def __init__(self, libname: str):
113+
self.libname = libname
114+
self.error_messages = []
115+
self.attachments = []
116+
self.abs_path = None
117+
118+
if IS_WINDOWS:
119+
self.lib_searched_for = f"{libname}*.dll"
120+
if self.abs_path is None:
121+
self.abs_path = _find_dll_using_nvidia_bin_dirs(
122+
libname, self.lib_searched_for, self.error_messages, self.attachments
123+
)
124+
else:
125+
self.lib_searched_for = f"lib{libname}.so"
126+
if self.abs_path is None:
127+
self.abs_path = _find_so_using_nvidia_lib_dirs(
128+
libname, self.lib_searched_for, self.error_messages, self.attachments
129+
)
130+
131+
def retry_with_cuda_home_priority_last(self):
132+
cuda_home_lib_dir = _find_lib_dir_using_cuda_home(self.libname)
133+
if cuda_home_lib_dir is not None:
134+
if IS_WINDOWS:
135+
self.abs_path = _find_dll_using_lib_dir(
136+
cuda_home_lib_dir, self.libname, self.error_messages, self.attachments
137+
)
138+
else:
139+
self.abs_path = _find_so_using_lib_dir(
140+
cuda_home_lib_dir, self.lib_searched_for, self.error_messages, self.attachments
141+
)
142+
143+
def raise_if_abs_path_is_None(self):
144+
if self.abs_path:
145+
return self.abs_path
146+
err = ", ".join(self.error_messages)
147+
att = "\n".join(self.attachments)
148+
raise RuntimeError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}')
149+
150+
151+
@functools.cache
152+
def find_nvidia_dynamic_library(libname: str) -> str:
153+
return _find_nvidia_dynamic_library(libname).raise_if_abs_path_is_None()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Copyright 2024-2025 NVIDIA Corporation. All rights reserved.
2+
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
import functools
5+
import os
6+
import site
7+
import sys
8+
9+
10+
def find_sub_dirs_no_cache(parent_dirs, sub_dirs):
11+
results = []
12+
for base in parent_dirs:
13+
stack = [(base, 0)] # (current_path, index into sub_dirs)
14+
while stack:
15+
current_path, idx = stack.pop()
16+
if idx == len(sub_dirs):
17+
if os.path.isdir(current_path):
18+
results.append(current_path)
19+
continue
20+
21+
sub = sub_dirs[idx]
22+
if sub == "*":
23+
try:
24+
entries = sorted(os.listdir(current_path))
25+
except OSError:
26+
continue
27+
for entry in entries:
28+
entry_path = os.path.join(current_path, entry)
29+
if os.path.isdir(entry_path):
30+
stack.append((entry_path, idx + 1))
31+
else:
32+
next_path = os.path.join(current_path, sub)
33+
if os.path.isdir(next_path):
34+
stack.append((next_path, idx + 1))
35+
return results
36+
37+
38+
@functools.cache
39+
def find_sub_dirs_cached(parent_dirs, sub_dirs):
40+
return find_sub_dirs_no_cache(parent_dirs, sub_dirs)
41+
42+
43+
def find_sub_dirs(parent_dirs, sub_dirs):
44+
return find_sub_dirs_cached(tuple(parent_dirs), tuple(sub_dirs))
45+
46+
47+
def find_sub_dirs_sys_path(sub_dirs):
48+
return find_sub_dirs(sys.path, sub_dirs)
49+
50+
51+
def find_sub_dirs_all_sitepackages(sub_dirs):
52+
return find_sub_dirs((site.getusersitepackages(),) + tuple(site.getsitepackages()), sub_dirs)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Copyright 2025 NVIDIA Corporation. All rights reserved.
2+
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
from dataclasses import dataclass
5+
from typing import Callable, Optional
6+
7+
from cuda.bindings._path_finder.supported_libs import DIRECT_DEPENDENCIES, IS_WINDOWS
8+
9+
if IS_WINDOWS:
10+
import pywintypes
11+
12+
HandleType = pywintypes.HANDLE
13+
else:
14+
HandleType = int
15+
16+
17+
@dataclass
18+
class LoadedDL:
19+
handle: HandleType
20+
abs_path: Optional[str]
21+
was_already_loaded_from_elsewhere: bool
22+
23+
24+
def load_dependencies(libname: str, load_func: Callable[[str], LoadedDL]) -> None:
25+
"""Load all dependencies for a given library.
26+
27+
Args:
28+
libname: The name of the library whose dependencies should be loaded
29+
load_func: The function to use for loading libraries (e.g. load_nvidia_dynamic_library)
30+
31+
Example:
32+
>>> load_dependencies("cudart", load_nvidia_dynamic_library)
33+
# This will load all dependencies of cudart using the provided loading function
34+
"""
35+
for dep in DIRECT_DEPENDENCIES.get(libname, ()):
36+
load_func(dep)

0 commit comments

Comments
 (0)