Skip to content

Conversation

@saturley-hall
Copy link
Member

@saturley-hall saturley-hall commented Sep 19, 2025

Overview:

The git version in the container as built needs to be updated to address CVE-2025-48384

Summary by CodeRabbit

  • New Features

    • Multi-frontend (NGINX) mode for SGLang jobs with templated config and orchestration.
    • End-to-end benchmarking tooling: CLI/async benchmark runner, VLLM/SGLang bench scripts, health checks, warmup, and profiling hooks.
    • Disaggregated baseline workflows and helpers for TRT-LLM performance sweeps.
  • Improvements

    • Enhanced SLURM submission: retries, extra args, consolidated welcome output.
    • Streamlined readiness checks, overlapped launches, and init-location controls.
  • Documentation

    • Added guidance for switching ISL/OSL defaults (1k/1k example).
  • Chores

    • Version bump to 0.5.1+rc0-pre1.
    • Container updates (new SGLang-wideep images, Git installed).

Signed-off-by: Harrison King Saturley-Hall <[email protected]>
Signed-off-by: Harrison King Saturley-Hall <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the fix label Sep 19, 2025
@saturley-hall saturley-hall changed the base branch from main to release/0.5.1-rc0.pre1 September 19, 2025 15:00
@saturley-hall saturley-hall merged commit 486628e into release/0.5.1-rc0.pre1 Sep 19, 2025
5 of 6 checks passed
@saturley-hall saturley-hall deleted the harrison/update_git_on_pre1 branch September 19, 2025 15:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 19, 2025

Caution

Review failed

Failed to post review comments

Walkthrough

Adds multi-frontend SGLang orchestration (nginx + frontends), profiling hooks, and new benchmarking utilities (vLLM and SGLang). Expands TRT-LLM performance sweep scripts with disaggregated baseline flow and revised bench runner. Introduces new Dockerfiles/build steps, updates a submodule, adjusts a TRT-LLM handler, and bumps project versions.

Changes

Cohort / File(s) Summary
Version bumps
Cargo.toml, pyproject.toml, lib/bindings/python/Cargo.toml, lib/bindings/python/pyproject.toml
Bump project/crate versions to 0.5.1+rc0-pre1 and update dependency; add pytest ignore-glob for sglang slurm jobs.
Git submodule config
.gitmodules, components/backends/trtllm/performance_sweeps/scripts/bench
Add bench_serving submodule and update submodule pointer to a specific commit.
SGLang SLURM orchestration
components/backends/sglang/slurm_jobs/job_script_template.j2, .../scripts/nginx.conf.j2, .../scripts/gb200-fp8.sh, .../scripts/benchmark_utils.sh, .../scripts/sglang/bench.sh, .../scripts/sglang_bench_serving.sh, .../scripts/gap/bench.sh, .../scripts/worker_setup.py, .../submit_job_script.py
Add multi-frontend mode (nginx + multiple frontends), update worker setup with new types/flags, change leader IP resolution, add profiling hooks and warmup/health utilities, introduce bench scripts, and extend job submission CLI/retry logic.
vLLM benchmarking tools
components/backends/sglang/slurm_jobs/scripts/vllm/backend_request_func.py, .../vllm/benchmark_serving.py, .../vllm/benchmark_utils.py, .../vllm/bench.sh
Add async backend request adapters, comprehensive benchmark runner, PyTorch-format converter, and a shell wrapper for running benchmarks.
TRT-LLM performance sweeps (baseline/disagg + bench revamp)
components/backends/trtllm/performance_sweeps/README.md, .../baseline_disagg.slurm, .../benchmark_disagg.slurm, .../scripts/bench.sh, .../scripts/gen_baseline_server_config.py, .../scripts/gen_yaml.py, .../scripts/start_baseline_frontend.sh, .../scripts/start_baseline_worker.sh, .../scripts/start_disagg_worker.sh, .../scripts/start_frontend.sh, .../submit_agg.sh, .../submit_disagg.sh
Add disaggregated baseline flow (workers + frontend + bench), revise bench to Python-based runner and new health checks, tweak YAML generation and networking/env, expand submission matrices, and document ISL/OSL usage.
Container build system
container/Dockerfile.sglang-wideep, container/Dockerfile.sglang-wideep-gb200, container/Dockerfile.trtllm, container/Dockerfile.trtllm_prebuilt, container/build.sh
Replace sglang-wideep with multi-stage from-source build; add new GB200 variant; install git in TRT-LLM images; update default experimental TRT-LLM commit.
TRT-LLM handler tweak
components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py
Remove explicit final-stop chunk; adjust empty-output error condition; add warning when finished without finish_reason.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Submitter as submit_job_script.py
    participant Slurm as SLURM
    participant NginxNode as nginx worker
    participant Master as frontend (master)
    participant Frontends as additional frontends
    participant Prefill as prefill workers
    participant Decode as decode workers
    participant etcd as etcd/NATS

    User->>Submitter: CLI args (--enable-multiple-frontends, profiler, etc.)
    Submitter->>Slurm: sbatch job (with templated vars)
    Slurm->>NginxNode: srun worker_setup.py --worker_type nginx
    Slurm->>Master: srun worker_setup.py --worker_type frontend
    Slurm->>Frontends: srun worker_setup.py --worker_type frontend (others)
    Slurm->>Prefill: srun worker_setup.py --worker_type prefill
    Slurm->>Decode: srun worker_setup.py --worker_type decode
    Note over etcd,Master: Coordination endpoints (etcd/NATS)
    Prefill-->>etcd: register/ready
    Decode-->>etcd: register/ready
    Frontends-->>Master: join/frontend ready
    NginxNode-->>User: proxy http://<nginx>:8000
Loading
sequenceDiagram
    autonumber
    participant Bench as bench.sh / sglang_bench_serving.sh
    participant Utils as benchmark_utils.sh
    participant Warm as warmup_model()
    participant Wait as wait_for_model()
    participant Py as benchmark_serving.py
    participant Backend as LLM Serving API

    Bench->>Utils: source
    Bench->>Wait: poll /health or /v1/models
    Wait-->>Bench: ready
    Bench->>Warm: warmup via sglang.bench_serving
    Warm-->>Bench: done
    loop for each concurrency
        Bench->>Py: run with args (rate, prompts, max-concurrency)
        Py->>Backend: async streaming requests
        Backend-->>Py: tokens/chunks
        Py-->>Bench: metrics JSON
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

Poem

hops and props, I twirl my ears—
new fronts arise, the nginx cheers.
I warm the model, tap-tap-tap,
then race through tokens, lap by lap.
SLURM drums roll, containers hum—
benchmarks bloom—thump-thump—yum! 🥕

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description only contains a one-line overview stating the Git update to address CVE-2025-48384 and does not follow the repository's required template: it is missing the "Details", "Where should the reviewer start?", and "Related Issues" sections and provides no file-level summary, testing steps, or rationale for other large changes present in the diff (e.g., version bumps, new scripts, Dockerfile edits). This lack of detail makes it difficult for reviewers to assess scope, verify the CVE remediation, or know which files to inspect first. Please update the PR body to follow the repository template: add a "Details" section listing the specific files and exact changes (e.g., which Dockerfiles or package versions were updated), a "Where should the reviewer start?" section pointing to high-impact files (container/Dockerfile.trtllm, container/Dockerfile.trtllm_prebuilt, pyproject.toml, relevant Docker build scripts, and any changed security-related lines), and a "Related Issues" line linking the CVE tracking issue or internal ticket; also include verification steps and any runtime or compatibility risks introduced by the version bumps.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.93% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix: update git version on pre1" is concise and clearly references the container Git update intended to address CVE-2025-48384, which is present in the diff (Dockerfile changes installing/updating git). However the changeset also contains broader work (workspace version bumps to 0.5.1+rc0-pre1, many new/modified scripts and templates), so the title only captures part of the PR's scope rather than the full surface of changes. Overall the title is acceptable but could be slightly more descriptive to help reviewers scanning history.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants