Add dedicated /metrics endpoints for Gateway and LLM services#46
Conversation
…issue #39) - Add /metrics endpoint to gateway service with request counts, latencies, error breakdowns, WebSocket connection stats, and LLM integration stats - Add /metrics endpoint to LLM service with request counts, latencies, error breakdowns, provider-level stats, and token usage tracking - Update K8s deployments to change prometheus.io/path from /healthz to /metrics for gateway and LLM - Update ServiceMonitors to scrape /metrics path for gateway and LLM services - Update Deploy_GEngine_To_Kubernetes.md documentation with health vs. metrics distinction and example metrics for all three services - Add comprehensive tests for new metrics functionality (9 gateway tests, 11 LLM tests) Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
- Extract token usage logic into _extract_token_usage helper function for consistency - Add documentation clarifying metrics counter relationships in GatewayMetrics - All 685 tests pass with 90.21% coverage Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
|
@copilot there have been many linting changes in main. Can you merge them into this branch |
I'm unable to merge main into this branch due to sandbox limitations - I can't pull branches from GitHub. You can merge main using either:
git fetch origin main
git merge origin/main --no-edit
git push origin copilot/add-dedicated-metrics-endpointsLet me know once you've merged main and I can address any conflicts or linting issues that arise. |
|
@copilot Review feedback:\n\n1. The new /metrics endpoints return JSON, but Prometheus expects OpenMetrics/text format. The current implementation will not work with Prometheus ServiceMonitor or annotations as described. Please refactor to use the prometheus_client library to expose metrics in the correct format.\n2. The Gateway service's get_metrics endpoint is synchronous (def, not async def) but is accessed from both the event loop and thread pool, which can cause concurrency issues. Please make it async and ensure safe access to shared state.\n3. For performance, consider using collections.deque(maxlen=N) for latency samples instead of list+pop(0).\n\n@copilot Please address these issues to ensure Prometheus compatibility and robust metrics handling. |
|
@copilot see the review feedback inthe previous comment and fix it please. |
…icated-metrics-endpoints
- Replace custom metrics classes with Prometheus Counter, Gauge, Histogram - Gateway /metrics now returns OpenMetrics text format - LLM /metrics now returns OpenMetrics text format - Make metrics endpoints async for thread safety - Add prometheus_client>=0.20.0 dependency - Update documentation with Prometheus format examples - Update tests to parse Prometheus text format Addresses review feedback: 1. Use prometheus_client for Prometheus-compatible format 2. Make get_metrics async for thread safety 3. Linting fixes (B904: raise from e) Co-authored-by: SorraTheOrc <250240+SorraTheOrc@users.noreply.github.com>
Addressed all review feedback in commit
Example metrics output now looks like: Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…_gateway_service.py
… in metrics tests and gateway app
… in dry-run workflow
…esolve NotFound errors
/metricsendpoint to gateway service/metricsendpoint to LLM serviceOriginal prompt
Created from Copilot CLI via the copilot delegate command.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.