You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: make Prometheus metric naming conventions more consistent
- Rename connections_total to current_connections (gauge for active connections)
- Rename client_disconnects_total to disconnected_clients_total (better ordering)
- Rename PROCESSING_TIME_MS_TOTAL to PROCESSING_MS_TOTAL (more concise)
- Apply unit_aggregation pattern: AVG_PROCESSING_MS -> PROCESSING_MS_AVG
- Sync ComponentNatsServerPrometheusMetrics variable names with metric constants
- Update documentation with comprehensive naming transformation rules
- Add units _messages and _connections to naming conventions
- Update all code references, documentation, and test comments consistently
These changes follow Prometheus best practices by distinguishing gauge vs
counter metrics and using consistent {unit}_{aggregation} naming patterns.
Signed-off-by: Keiven Chang <[email protected]>
Copy file name to clipboardExpand all lines: deploy/metrics/README.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,7 +70,7 @@ Some components expose additional metrics specific to their functionality:
70
70
71
71
When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
@@ -79,7 +79,7 @@ When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), th
79
79
-`dynamo_frontend_requests_total`: Total LLM requests (counter)
80
80
-`dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
81
81
82
-
**Note**: The `dynamo_frontend_inflight_requests_total` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests_total` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
82
+
**Note**: The `dynamo_frontend_inflight_requests` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests_total` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
83
83
84
84
#### Request Processing Flow
85
85
@@ -125,10 +125,10 @@ Try launching a frontend and a Mocker backend that allows 3 concurrent requests:
0 commit comments