fix: update git version on pre1 #3140

saturley-hall · 2025-09-19T15:00:10Z

Overview:

The git version in the container as built needs to be updated to address CVE-2025-48384

Summary by CodeRabbit

New Features
- Multi-frontend (NGINX) mode for SGLang jobs with templated config and orchestration.
- End-to-end benchmarking tooling: CLI/async benchmark runner, VLLM/SGLang bench scripts, health checks, warmup, and profiling hooks.
- Disaggregated baseline workflows and helpers for TRT-LLM performance sweeps.
Improvements
- Enhanced SLURM submission: retries, extra args, consolidated welcome output.
- Streamlined readiness checks, overlapped launches, and init-location controls.
Documentation
- Added guidance for switching ISL/OSL defaults (1k/1k example).
Chores
- Version bump to 0.5.1+rc0-pre1.
- Container updates (new SGLang-wideep images, Git installed).

Signed-off-by: Harrison King Saturley-Hall <[email protected]>

copy-pr-bot · 2025-09-19T15:00:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-19T15:19:59Z

Caution

Review failed

Failed to post review comments

Walkthrough

Adds multi-frontend SGLang orchestration (nginx + frontends), profiling hooks, and new benchmarking utilities (vLLM and SGLang). Expands TRT-LLM performance sweep scripts with disaggregated baseline flow and revised bench runner. Introduces new Dockerfiles/build steps, updates a submodule, adjusts a TRT-LLM handler, and bumps project versions.

Changes

Cohort / File(s)	Summary
Version bumps `Cargo.toml`, `pyproject.toml`, `lib/bindings/python/Cargo.toml`, `lib/bindings/python/pyproject.toml`	Bump project/crate versions to 0.5.1+rc0-pre1 and update dependency; add pytest ignore-glob for sglang slurm jobs.
Git submodule config `.gitmodules`, `components/backends/trtllm/performance_sweeps/scripts/bench`	Add bench_serving submodule and update submodule pointer to a specific commit.
SGLang SLURM orchestration `components/backends/sglang/slurm_jobs/job_script_template.j2`, `.../scripts/nginx.conf.j2`, `.../scripts/gb200-fp8.sh`, `.../scripts/benchmark_utils.sh`, `.../scripts/sglang/bench.sh`, `.../scripts/sglang_bench_serving.sh`, `.../scripts/gap/bench.sh`, `.../scripts/worker_setup.py`, `.../submit_job_script.py`	Add multi-frontend mode (nginx + multiple frontends), update worker setup with new types/flags, change leader IP resolution, add profiling hooks and warmup/health utilities, introduce bench scripts, and extend job submission CLI/retry logic.
vLLM benchmarking tools `components/backends/sglang/slurm_jobs/scripts/vllm/backend_request_func.py`, `.../vllm/benchmark_serving.py`, `.../vllm/benchmark_utils.py`, `.../vllm/bench.sh`	Add async backend request adapters, comprehensive benchmark runner, PyTorch-format converter, and a shell wrapper for running benchmarks.
TRT-LLM performance sweeps (baseline/disagg + bench revamp) `components/backends/trtllm/performance_sweeps/README.md`, `.../baseline_disagg.slurm`, `.../benchmark_disagg.slurm`, `.../scripts/bench.sh`, `.../scripts/gen_baseline_server_config.py`, `.../scripts/gen_yaml.py`, `.../scripts/start_baseline_frontend.sh`, `.../scripts/start_baseline_worker.sh`, `.../scripts/start_disagg_worker.sh`, `.../scripts/start_frontend.sh`, `.../submit_agg.sh`, `.../submit_disagg.sh`	Add disaggregated baseline flow (workers + frontend + bench), revise bench to Python-based runner and new health checks, tweak YAML generation and networking/env, expand submission matrices, and document ISL/OSL usage.
Container build system `container/Dockerfile.sglang-wideep`, `container/Dockerfile.sglang-wideep-gb200`, `container/Dockerfile.trtllm`, `container/Dockerfile.trtllm_prebuilt`, `container/build.sh`	Replace sglang-wideep with multi-stage from-source build; add new GB200 variant; install git in TRT-LLM images; update default experimental TRT-LLM commit.
TRT-LLM handler tweak `components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py`	Remove explicit final-stop chunk; adjust empty-output error condition; add warning when finished without finish_reason.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Submitter as submit_job_script.py
    participant Slurm as SLURM
    participant NginxNode as nginx worker
    participant Master as frontend (master)
    participant Frontends as additional frontends
    participant Prefill as prefill workers
    participant Decode as decode workers
    participant etcd as etcd/NATS

    User->>Submitter: CLI args (--enable-multiple-frontends, profiler, etc.)
    Submitter->>Slurm: sbatch job (with templated vars)
    Slurm->>NginxNode: srun worker_setup.py --worker_type nginx
    Slurm->>Master: srun worker_setup.py --worker_type frontend
    Slurm->>Frontends: srun worker_setup.py --worker_type frontend (others)
    Slurm->>Prefill: srun worker_setup.py --worker_type prefill
    Slurm->>Decode: srun worker_setup.py --worker_type decode
    Note over etcd,Master: Coordination endpoints (etcd/NATS)
    Prefill-->>etcd: register/ready
    Decode-->>etcd: register/ready
    Frontends-->>Master: join/frontend ready
    NginxNode-->>User: proxy http://<nginx>:8000

sequenceDiagram
    autonumber
    participant Bench as bench.sh / sglang_bench_serving.sh
    participant Utils as benchmark_utils.sh
    participant Warm as warmup_model()
    participant Wait as wait_for_model()
    participant Py as benchmark_serving.py
    participant Backend as LLM Serving API

    Bench->>Utils: source
    Bench->>Wait: poll /health or /v1/models
    Wait-->>Bench: ready
    Bench->>Warm: warmup via sglang.bench_serving
    Warm-->>Bench: done
    loop for each concurrency
        Bench->>Py: run with args (rate, prompts, max-concurrency)
        Py->>Backend: async streaming requests
        Backend-->>Py: tokens/chunks
        Py-->>Bench: metrics JSON
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

feat: Add performance sweeps for DeepSeek R1 on GB200 #2387 — Overlaps TRT-LLM performance_sweeps scripts and bench flows under the same directories.
feat: sglang multiple frontend slurm deployment scripts #2623 — Similar SGLang SLURM job updates including leader IP resolution via network interface.
ci: add support for vllm sanity testing on Github #2526 — Related container build infrastructure changes across Dockerfiles and build scripts.

Poem

hops and props, I twirl my ears—
new fronts arise, the nginx cheers.
I warm the model, tap-tap-tap,
then race through tokens, lap by lap.
SLURM drums roll, containers hum—
benchmarks bloom—thump-thump—yum! 🥕

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description only contains a one-line overview stating the Git update to address CVE-2025-48384 and does not follow the repository's required template: it is missing the "Details", "Where should the reviewer start?", and "Related Issues" sections and provides no file-level summary, testing steps, or rationale for other large changes present in the diff (e.g., version bumps, new scripts, Dockerfile edits). This lack of detail makes it difficult for reviewers to assess scope, verify the CVE remediation, or know which files to inspect first.	Please update the PR body to follow the repository template: add a "Details" section listing the specific files and exact changes (e.g., which Dockerfiles or package versions were updated), a "Where should the reviewer start?" section pointing to high-impact files (container/Dockerfile.trtllm, container/Dockerfile.trtllm_prebuilt, pyproject.toml, relevant Docker build scripts, and any changed security-related lines), and a "Related Issues" line linking the CVE tracking issue or internal ticket; also include verification steps and any runtime or compatibility risks introduced by the version bumps.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.93% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "fix: update git version on pre1" is concise and clearly references the container Git update intended to address CVE-2025-48384, which is present in the diff (Dockerfile changes installing/updating git). However the changeset also contains broader work (workspace version bumps to 0.5.1+rc0-pre1, many new/modified scripts and templates), so the title only captures part of the PR's scope rather than the full surface of changes. Overall the title is acceptable but could be slightly more descriptive to help reviewers scanning history.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

saturley-hall added 2 commits September 19, 2025 10:53

fix: update git in trtllm container

da55ac6

Signed-off-by: Harrison King Saturley-Hall <[email protected]>

fix: update git in prebuilt

4289eb3

Signed-off-by: Harrison King Saturley-Hall <[email protected]>

saturley-hall requested review from a team, alec-flowers, ishandhanani, nnshah1, ptarasiewiczNV, richardhuo-nv, rmccorm4 and tanmayv25 as code owners September 19, 2025 15:00

pull-request-size bot added the size/XXL label Sep 19, 2025

github-actions bot added the fix label Sep 19, 2025

saturley-hall changed the base branch from main to release/0.5.1-rc0.pre1 September 19, 2025 15:00

pull-request-size bot added size/S and removed size/XXL labels Sep 19, 2025

rmccorm4 approved these changes Sep 19, 2025

View reviewed changes

saturley-hall merged commit 486628e into release/0.5.1-rc0.pre1 Sep 19, 2025
5 of 6 checks passed

saturley-hall deleted the harrison/update_git_on_pre1 branch September 19, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: update git version on pre1 #3140

fix: update git version on pre1 #3140

Uh oh!

saturley-hall commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 19, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Sep 19, 2025 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: update git version on pre1 #3140

fix: update git version on pre1 #3140

Uh oh!

Conversation

saturley-hall commented Sep 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 19, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saturley-hall commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 19, 2025 •

edited

Loading