[wip] Ray 2.56 nightly + Dynamo 1.3.0 + vLLM 0.22 (cu129)#2064
[wip] Ray 2.56 nightly + Dynamo 1.3.0 + vLLM 0.22 (cu129)#2064praateekmahajan wants to merge 13 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
dc21302 to
aaa42ad
Compare
|
/ok to test aaa42ad /claude review @greptileai review |
Greptile SummaryThis draft PR threads the Ray 3.0 nightly + ai-dynamo nightly + vLLM 0.22.0+cu129 (CUDA 12.9) stack through both
Confidence Score: 3/5Not safe to merge as-is: the ray nightly wheel entry in core dependencies is restricted to Python 3.13 only, silently dropping ray for Python 3.11 and 3.12 users the project still supports. The core ray dependency is now a direct-URL nightly wheel valid only for Python 3.13 + x86_64 + Linux. Since the project supports Python >=3.11, any 3.11 or 3.12 install gets no ray from the base dependency list — import ray in core modules would fail at runtime with no warning at install time. The ray entry in the dependencies block of pyproject.toml needs a fallback for Python 3.11/3.12. The _vllm_cu129_index_url function in vllm.py has two minor defensive-coding gaps worth addressing before merge. Important Files Changed
Reviews (1): Last reviewed commit: "Support ray/dynamo nightly + vLLM 0.22 (..." | Re-trigger Greptile |
Enable the Ray 3.0 nightly + ai-dynamo nightly + vLLM 0.22 inference stack on the CUDA-12.9 image while keeping the full Curator dependency set (`uv sync --all-extras --all-groups`) resolvable and buildable. pyproject.toml: - ray: 3.0.0.dev0 nightly wheel routed per python tag (x86_64) via [tool.uv.sources] so `dependencies` stays a plain `ray[default,data]`; aarch64/other resolve ray from PyPI via the >= floor (keeps ray[llm]'s default cu130 vllm off aarch64). Rolling /latest/ wheels are re-pinned by re-locking, never frozen. - ai-dynamo and ai-dynamo-runtime >=1.3.0.dev0, both first-party so prerelease="if-necessary-or-explicit" enables the newest nightly without blanket prereleases (runtime is a transitive with stable releases, so it needs an explicit marker or uv backtracks to an older dynamo dev). - vLLM 0.22.0+cu129 via a dedicated cu129 wheel index + tool.uv.sources (default vLLM is now cu130; keep torch/vllm on CUDA 12.9). - drop nixl-cu13: ray[llm]/nixl hard-pin the CUDA-13 NIXL backend, whose eager `import nixl_ep` dlopens the absent libcudart.so.13 on cu12.9; keep the nixl meta + nixl-cu12 backend. - opencv-python -> opencv-python-headless (no libGL/GPL GUI/FFmpeg bundling; matches vllm/mistral_common/albumentations). - bump torch/torchvision/torchaudio/torchcodec to the 2.11 cu129 line. dynamo actor venv runtime_env (vllm.py): Ray builds it via a bare `uv pip install ai-dynamo[vllm]` that ignores pyproject, so force cu129 the way uv/vLLM document: --torch-backend cu129, unsafe-best-match (needed for nixl's split index resolution), and a per-version cu129 vllm index derived from ai-dynamo's own pin (`_vllm_cu129_index_url`); the --override file pins ray== and drops nixl-cu13. Signed-off-by: Praateek <praateekm@gmail.com>
aaa42ad to
4f0cdf4
Compare
|
/ok to test 4f0cdf4 |
…ror) The prior docker/Dockerfile stub only patched /opt/venv, but CI unit tests run from a fresh .venv (uv sync), so ray's dashboard frontend was still missing there — the dashboard process died with FrontendNotFoundError and every ray.util.state call (Xenna drives pipelines through it) failed with "Could not read 'dashboard' from GCS", erroring all xenna backends/audio/text tests. Move the stub into nemo_curator/__init__.py so it runs once on import, relative to the installed ray (works in any venv), and gate it to dev/nightly ray so published releases (which ship client/build) are untouched. Drop the redundant Dockerfile stub. Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
…-cu129 Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
…-cu129 Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
|
/ok to test 214d015 |
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
What
Enables the Ray 3.0 nightly + ai-dynamo nightly + vLLM 0.22 (CUDA 12.9) inference stack while keeping the full Curator dependency set resolvable and buildable — validated with
uv sync --all-extras --all-groups(full container builds + every extra imports together: torch/vllm/cv2/cudf/cuml/nixl/dynamo/nemo_curator).Changes
pyproject.tomlraytracks the3.0.0.dev0nightly wheel (rolling/latest/URL).ai-dynamoandai-dynamo-runtime>=1.3.0.dev0, both first-party soprerelease="if-necessary-or-explicit"enables the newest nightly without blanket prereleases. (ai-dynamo-runtimeis a transitive with stable releases, so without an explicit marker uv backtracks to an older dynamo dev.)prerelease = "if-necessary-or-explicit"(was a blanketallow).0.22.0+cu129via a dedicated cu129 wheel index +tool.uv.sources— the default vLLM wheel is now cu130 (VLLM_MAIN_CUDA_VERSION=13.0), so torch/vllm are kept on CUDA 12.9.nixl-cu13:ray[llm]/nixlhard-pin the CUDA-13 NIXL backend, whose eagerimport nixl_epdlopens the absentlibcudart.so.13on a cu12.9 image; thenixlmeta +nixl-cu12backend remain.opencv-python→opencv-python-headless(nolibGL/GPL GUI/FFmpeg bundling; matches what vllm/mistral_common/albumentations already request).nemo_curator/core/serve/dynamo/vllm.py— the Dynamo actor venv is built by Ray'suvruntime_env via a bareuv pip install ai-dynamo[vllm]that ignorespyproject. Force cu129 the way uv/vLLM document:--torch-backend cu129,unsafe-best-match(required for nixl's split-index resolution), and a per-version cu129 vLLM index derived from ai-dynamo's own pin (so the actor honors dynamo's vLLM version as cu129); the--overridefile pinsray==<head>and dropsnixl-cu13.Notes
uv lock --refresh-package … --upgrade-package …for ray/ai-dynamo/ai-dynamo-runtime before building.docker build --ulimit nofile=1048576:1048576(rapids file count, unrelated to these deps).🤖 Generated with Claude Code