-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
Description
The initial implementation of Prometheus metrics in PR #52 was explicitly scoped to single-client mode (n=1). Now that PR #53 (Add multi-client support) is merged, we need to extend this functionality.
Currently, the prom_metrics field is null in multi-client output. The fundamental difference for implementation is that a multi-client run involves multiple, non-concurrent client processes, each with its own specific start and end time.
Expected Behavior
When running the benchmark in multi-client mode with --enable_prom_metrics, the system must be able to:
- Utilize Individual Timestamps: Use the specific
client_time_startandclient_time_endlogs captured for each client process to define the time window for metric collection. - Individually Isolate Metrics: Query Prometheus/Thanos to collect resource usage (CPU, memory, etc.) for the Runner Pod. The key is to filter these metrics based on the unique start and end timestamps of each client.
⚠️ OPEN TO DISCUSSION⚠️
Should the metrics represent only the client's individual process usage, or the Runner Pod's total usage during the client's execution window?
- Complete Output: Associate the isolated metrics with the correct client's JSON output payload.
Technical Notes from Current Implementation (PR #52 Context)
- Current Single-Client Logic: The existing logic in
process_servercorrectly usesclient_start_timeandclient_end_timeto define the time window for the single client. - Multi-Client Challenge: The multi-client runner likely executes clients sequentially. The
processscript must iterate through the logs/outputs for each client instance (client.log.<instance>), extract its unique start/end times, and query Prometheus accordingly. - Metric Scope (Needs Decision):
- Since all clients run within the same Runner Pod, the primary metric source remains the
RUNNER_POD_NAME. - The team needs to determine if resource usage should be calculated as:
- Option A: The pod's total usage during the client's time window.
- Option B: An attempt to isolate client-specific process metrics (if available) within that time window.
- Since all clients run within the same Runner Pod, the primary metric source remains the
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed