[experimental chore/agentx-v0.2-aiperf-testing  branch] agentic launcher: TOTAL_CPU_DRAM_GB=2000 hardcoded, OOMs on 1.5 TB MI355X nodes

## Summary

`benchmarks/single_node/agentic/minimaxm2.5_fp8_mi355x.sh` hardcodes `TOTAL_CPU_DRAM_GB=2000` inside the `cpu` offload branch, overriding any caller-provided value. On MI355X nodes with less than ~2 TB of host RAM (e.g. AAC1 cluster nodes have **1.5 TB**), this triggers an OOM-kill of one or more vLLM TP workers during `SimpleCPUOffloadConnector` initialization.

## Repro

On a 1.5 TB MI355X node (e.g. AAC1 `smci355-ccs-aus-g12-*`):

```bash
podman run ... \
  -e MODEL=MiniMaxAI/MiniMax-M2.5 -e TP=4 -e CONC=16 \
  -e OFFLOADING=cpu -e TOTAL_CPU_DRAM_GB=1200 \
  ... vllm/vllm-openai-rocm:nightly-51f22dcfd0... \
  /workspace/benchmarks/single_node/agentic/minimaxm2.5_fp8_mi355x.sh
```

Even though the env passes `TOTAL_CPU_DRAM_GB=1200`, the launcher overwrites it to 2000. Each TP worker then tries to allocate `2000 / 4 = 500 GB` of pinned host memory; total allocation = 2000 GB > 1500 GB available → Worker_TP2 dies during init, EngineCore reports `Worker proc VllmWorker-2 died unexpectedly`.

Server log:
```
(Worker_TP3) INFO ... [worker.py:144] SimpleCPUOffloadWorker: 124 unique GPU KV tensors, allocating 528516 CPU blocks (500.00 GB)
(Worker_TP0) ...
(Worker_TP1) ...
(Worker_TP2) ...
(EngineCore) INFO ... [shm_broadcast.py:681] No available shared memory broadcast block found in 60 seconds.
(EngineCore) INFO ... [shm_broadcast.py:681] No available shared memory broadcast block found in 60 seconds.
(EngineCore) ERROR ... Worker proc VllmWorker-2 died unexpectedly, shutting down executor.
```

## Suggested fix

Make the value respect a caller-provided env var, falling back to 2000 only when unset:

```diff
-        TOTAL_CPU_DRAM_GB=2000
+        # Respect env override; AAC1 MI355X nodes have only 1.5 TB.
+        TOTAL_CPU_DRAM_GB=${TOTAL_CPU_DRAM_GB:-2000}
```

After this patch, `TOTAL_CPU_DRAM_GB=1200` runs cleanly to completion (verified with full 30-min CONC=16 sweep, 1116 reqs / 1.88% err).

The same hardcode pattern likely exists in `minimaxm2.5_fp8_mi300x.sh` and `minimaxm2.5_fp8_mi325x.sh` and should get the same treatment.

## Environment
- Branch: `chore/agentx-v0.2-aiperf-testing` (tip `c8dfb585`)
- Image: `vllm/vllm-openai-rocm:nightly-51f22dcfd068fe8f1e3192da2a1e825b930223cf`
- Hardware: AAC1 MI355X partition `256C8G1H_MI355X_Ubuntu22`, 1.5 TB RAM/node


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental chore/agentx-v0.2-aiperf-testing branch] agentic launcher: TOTAL_CPU_DRAM_GB=2000 hardcoded, OOMs on 1.5 TB MI355X nodes #1358

Summary

Repro

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[experimental chore/agentx-v0.2-aiperf-testing branch] agentic launcher: TOTAL_CPU_DRAM_GB=2000 hardcoded, OOMs on 1.5 TB MI355X nodes #1358

Description

Summary

Repro

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions