Skip to content

Conversation

keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Sep 5, 2025

Overview:

This PR reverts Dockerfile.vllm and build/run scripts to maintain the same build and run behaviors as before August 28 commit 82bae24. This is to maintain backward compatibility.

Details:

  • Split Dockerfile.vllm dev target into two distinct targets:
    • local-dev: For VS Code/Cursor Dev Container plugin use only
    • dev: For command-line development with run.sh script
  • Add comprehensive feature matrix comparing both development targets
  • Remove --uid/--gid options from build.sh (now handled by local-dev target)
  • Remove DEV_MODE logic from run.sh (simplified workspace mounting)
  • Consolidate ENV variables in both targets for better maintainability
  • Update build.sh to use local-dev target for UID/GID mapping
  • Maintain backward compatibility with existing workflows

Where should the reviewer start?

  • container/Dockerfile.vllm: Review the feature matrix and target separation
  • container/build.sh: Check removal of --uid/--gid options and UID/GID handling
  • container/run.sh: Verify simplified workspace mounting logic

Related Issues:

BUG-5501463

Summary by CodeRabbit

  • New Features
    • Namespace scoping via DYN_NAMESPACE across frontend/backends; per-namespace model discovery.
    • Runtime graceful shutdown with coordinated endpoint draining.
    • container/run.sh supports --entrypoint override.
    • New multimodal Qwen deployment example and multiple perf test configs.
  • Improvements
    • Workers invoke Python modules directly; decode workers use conditional graceful shutdown.
    • Planner input handling (NaN→0, ignore initial idle); better logs.
    • Hello World example streams with cancellation.
  • Bug Fixes
    • Profiler timestamps normalized to integer ms.
  • Documentation
    • Helm-based installation overhaul, updated links, chart/version bumps, etcd image guidance.

@github-actions github-actions bot added the fix label Sep 5, 2025
@keivenchang keivenchang changed the base branch from main to release/0.5.0 September 5, 2025 17:49
@keivenchang keivenchang changed the title fix: 0.5.0 cherry pick 82bae247b56258a08e26bb6dd305e69981be98b0 (revert Dockerfile.vllm, build.sh, and run.sh) fix: 0.5.0 by reverting Dockerfile.vllm, build.sh, and run.sh (cherry pick 82bae247b56258a08e26bb6dd305e69981be98b0 from main) Sep 5, 2025
Copy link
Contributor

coderabbitai bot commented Sep 5, 2025

Caution

Review failed

Failed to post review comments.

Walkthrough

This PR introduces namespace scoping across components, restructures graceful shutdown in the runtime, updates vLLM/sglang deployments to direct python invocations, adjusts planner input handling/logging, adds dev tooling changes (new Dockerfile stage, run.sh entrypoint override), overhauls Helm/cloud docs, and adds perf test manifests and example updates.

Changes

Cohort / File(s) Summary
Devcontainer & Build/Run Tooling
​.devcontainer/*, container/Dockerfile.vllm, container/Dockerfile.trtllm, container/build.sh, container/run.sh
Devcontainer image tag switched to dynamo:latest-vllm; adds a new dev stage to Dockerfile.vllm; installs NIXL wheels into both venv and system in TRT-LLM Dockerfile; trims blank lines in build.sh; run.sh gains --entrypoint option and enhanced GDS mounts.
Backend Deploy Manifests (vLLM/SGLang)
components/backends/vllm/deploy/disagg_planner.yaml, components/backends/sglang/deploy/disagg_planner.yaml
Replace shell heredoc commands with python3 -m ... plus structured args; YAML numeric args quoted as strings; no flag changes beyond representation.
Backend Namespace & Behavior Tweaks
components/backends/mocker/src/dynamo/mocker/main.py, components/backends/sglang/src/dynamo/sglang/{args.py,main.py,utils/clear_namespace.py}, components/backends/vllm/src/dynamo/vllm/{args.py,handlers.py,main.py}
Introduce/use DYN_NAMESPACE for endpoint namespace; default endpoints become namespace-aware; sglang enables graceful shutdown; adds assertion for namespace in clear-namespace tool; vLLM args remove --endpoint, derive namespace/component/endpoint directly; handler loop robustness and prefill gating; decode graceful shutdown depends on migration_limit; adds lifecycle logs.
Frontend & Python Bindings
components/frontend/src/dynamo/frontend/main.py, lib/bindings/python/rust/llm/entrypoint.rs
Frontend adds --namespace (defaults from DYN_NAMESPACE) and passes it via EntrypointArgs; Rust bindings add optional namespace to EntrypointArgs and thread into LocalModelBuilder.
LLM Crate: Namespace & Watcher
lib/llm/src/{namespace.rs,lib.rs,local_model.rs,discovery/watcher.rs}, lib/llm/src/entrypoint/input/{grpc.rs,http.rs,common.rs}, lib/llm/tests/{namespace.rs,http_namespace_integration.rs}
Add namespace module with GLOBAL_NAMESPACE and is_global_namespace; extend LocalModel{Builder} with optional namespace; ModelWatcher::watch now filters by optional target namespace; gRPC/HTTP inputs pass namespace filter; adjust call sites; add comprehensive tests for namespace behavior.
Runtime: Graceful Shutdown & Cancellation
lib/runtime/src/{lib.rs,runtime.rs,distributed.rs,component/endpoint.rs,utils.rs,utils/graceful_shutdown.rs,pipeline/network/ingress/push_endpoint.rs}
Introduce GracefulShutdownTracker and endpoint-level shutdown token; stage shutdown into endpoint-stop then system-stop; push endpoints accept a cancellation_token via builder; monitor lease/runtime cancellation; add logs; expose tracker internally.
Planner & Benchmarks
components/planner/src/dynamo/planner/utils/{load_predictor.py,planner_core.py}, benchmarks/sin_load_generator/sin_synth.py, benchmarks/profiler/README.md, tests/planner/README.md
Normalize NaNs to 0, skip initial idle zeros; Prophet stores timestamped points; rename and log throughput computations; cast timestamps to int ms in synthetic load; update docs and examples (TTFT 0.2s, new dataset).
Perf Test Manifests & Examples
tests/planner/perf_test_configs/*, examples/multimodal/deploy/agg_qwen.yaml, examples/runtime/hello_world/hello_world.py
Add multiple DynamoGraphDeployment perf configs (agg/disagg/tp2/planner) and image-cache DaemonSet; add multimodal Qwen deployment; hello_world switches to Context and streams word-by-word with cancellation checks.
Helm/Cloud Docs & Charts
deploy/cloud/helm/*, deploy/helm/chart/Chart.yaml, deploy/cloud/operator/*, docs/guides/dynamo_deploy/*, deploy/README.md, docs/index.rst, deploy/inference-gateway/{README.md,helm/dynamo-gaie/templates/dynamo-epp.yaml}
Helm/docs overhaul: remove deploy.sh and platform values; bump chart versions; add installation guide and API doc generation path; update links/paths; etcd dep/version and image override; gate EPP config flag in template; expand inference-gateway docs.
Misc Kubernetes Manifest
benchmarks/nixl/nixl-benchmark-deployment.yaml
Rename imagePullSecret from nvcrimagepullsecret to nvcr-imagepullsecret.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant Frontend
  participant Runtime
  participant Endpoint
  participant Tracker as GracefulShutdownTracker

  User->>Frontend: SIGTERM / shutdown request
  Frontend->>Runtime: shutdown()
  Note over Runtime: Phase 1: Stop accepting new work
  Runtime->>Runtime: cancel endpoint_shutdown_token
  Runtime->>Tracker: wait_for_completion()
  par Active endpoints finishing
    loop each active endpoint
      Runtime->>Endpoint: cancellation_token triggered
      Endpoint-->>Tracker: unregister_endpoint()
    end
  and Wait for completion
    Tracker-->>Runtime: all endpoints complete
  end
  Note over Runtime: Phase 2: System shutdown
  Runtime->>Runtime: cancel main token (NATS/ETCD)
  Runtime-->>Frontend: shutdown complete
Loading
sequenceDiagram
  autonumber
  participant Env as Env (DYN_NAMESPACE)
  participant Frontend
  participant PyBind as EntrypointArgs (Py/Rust)
  participant LLM as LocalModelBuilder
  participant Watcher as ModelWatcher
  participant Store as etcd

  Env-->>Frontend: namespace (optional)
  Frontend->>PyBind: EntrypointArgs(namespace)
  PyBind->>LLM: .namespace(namespace)
  Frontend->>Store: subscribe to model events
  Store-->>Watcher: WatchEvent stream
  Frontend->>Watcher: watch(events, target_namespace)
  alt target_namespace specified
    Watcher->>Watcher: filter events by namespace
  else global/None
    Watcher->>Watcher: accept all namespaces
  end
  Watcher-->>Frontend: model updates (scoped)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit taps the namespace drum,
“dynamo” hums, but custom ones come.
Endpoints bow, then gracefully fade,
Trackers count the last parade.
Docker sings, scripts align—
Benchmarks tick in integer time.
Hop! New charts—deploy divine. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@saturley-hall saturley-hall merged commit c02099b into release/0.5.0 Sep 5, 2025
8 checks passed
@saturley-hall saturley-hall deleted the keivenchang/0.5.0-cherry-pick-82bae247b56258a08e26bb6dd305e69981be98b0 branch September 5, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants