Inference engine startup included in GPU metric capture

**Describe the bug**
Current scripts in the benchmarks/single_node directory (e.g., https://github.com/SemiAnalysisAI/InferenceX/blob/adbaae52ddf2569ddf2e793f5ad0a56f2f2a4d13/benchmarks/single_node/qwen3.5_fp8_mi325x.sh) follow this pattern to establish GPU monitoring for every InferenceX benchmark run:

1) start_gpu_monitor
2) Start LLM inference engine as a background process (e.g. SGLang, vLLM)
3) wait_for_server_ready
4) Run benchmark for the model (key result to report for InferenceX benchmark)
5) stop_gpu_monitor

As-is, the GPU energy consumption and other metrics can be skewed by irregularities in LLM inference engine startup (compilation, kernel autotuning, etc) and the 5-second sleeps used while waiting for the server to start (waiting on the /health route). While cross-referencing other logs can identify some timestamps that fall outside of the actual benchmark window, there is no direct correspondence within the CSV itself.

**Expected behavior**
Monitoring data should only correspond to the benchmark execution window. Alternatively, providing a way to identify which portion of the data corresponds to benchmark execution would also be acceptable.

**Potential solutions**
Adding a marker in the `gpu_metrics.csv` or delaying the `start_gpu_monitor` calls until the benchmark is ready to run (moving step 1 above to just before step 4) can assist in disambiguating the results.

**Additional context**
The startup costs of pinned versions of vLLM (likely similar for other engines) should be relatively consistent for a GitHub action runner, but the inference engine's startup performance can change over successive versions. The ability to separate startup costs from the benchmark data is important.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference engine startup included in GPU metric capture #1507

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inference engine startup included in GPU metric capture #1507

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions