Skip to content

Inference engine startup included in GPU metric capture #1507

@thomas-primalabs

Description

@thomas-primalabs

Describe the bug
Current scripts in the benchmarks/single_node directory (e.g., https://github.com/SemiAnalysisAI/InferenceX/blob/adbaae52ddf2569ddf2e793f5ad0a56f2f2a4d13/benchmarks/single_node/qwen3.5_fp8_mi325x.sh) follow this pattern to establish GPU monitoring for every InferenceX benchmark run:

  1. start_gpu_monitor
  2. Start LLM inference engine as a background process (e.g. SGLang, vLLM)
  3. wait_for_server_ready
  4. Run benchmark for the model (key result to report for InferenceX benchmark)
  5. stop_gpu_monitor

As-is, the GPU energy consumption and other metrics can be skewed by irregularities in LLM inference engine startup (compilation, kernel autotuning, etc) and the 5-second sleeps used while waiting for the server to start (waiting on the /health route). While cross-referencing other logs can identify some timestamps that fall outside of the actual benchmark window, there is no direct correspondence within the CSV itself.

Expected behavior
Monitoring data should only correspond to the benchmark execution window. Alternatively, providing a way to identify which portion of the data corresponds to benchmark execution would also be acceptable.

Potential solutions
Adding a marker in the gpu_metrics.csv or delaying the start_gpu_monitor calls until the benchmark is ready to run (moving step 1 above to just before step 4) can assist in disambiguating the results.

Additional context
The startup costs of pinned versions of vLLM (likely similar for other engines) should be relatively consistent for a GitHub action runner, but the inference engine's startup performance can change over successive versions. The ability to separate startup costs from the benchmark data is important.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions