Skip to content

Commit bc2f512

Browse files
nmveereshrakduttacrivetimihai
authored
Prometheus Metrics Instrumentation (Feature #218) (#1313)
* Add metrics service and unit tests; update deployment and configs * added /metrics/prometheus Signed-off-by: Veeresh K <[email protected]> * fixed lint issue Signed-off-by: Veeresh K <[email protected]> * fixed metrics.py Signed-off-by: Veeresh K <[email protected]> * fixed metrics.py Signed-off-by: Veeresh K <[email protected]> * fixed metrics.py Signed-off-by: Veeresh K <[email protected]> * main registration Signed-off-by: rakdutta <[email protected]> * setup_metric Signed-off-by: rakdutta <[email protected]> * doc Signed-off-by: rakdutta <[email protected]> * docs: add Prometheus metrics env vars to .env.example Add comprehensive documentation for Prometheus metrics configuration variables to .env.example: - ENABLE_METRICS: Toggle metrics collection (default: true) - METRICS_EXCLUDED_HANDLERS: Regex patterns for endpoint exclusion - METRICS_NAMESPACE: Metrics name prefix (default: "default") - METRICS_SUBSYSTEM: Secondary metrics prefix - METRICS_CUSTOM_LABELS: Static labels for app_info gauge Includes examples, security warnings about high-cardinality labels, and formatting consistent with existing configuration sections. Related to PR #1313 Signed-off-by: Mihai Criveti <[email protected]> * Linting Signed-off-by: Mihai Criveti <[email protected]> --------- Signed-off-by: Veeresh K <[email protected]> Signed-off-by: rakdutta <[email protected]> Signed-off-by: Mihai Criveti <[email protected]> Co-authored-by: rakdutta <[email protected]> Co-authored-by: Mihai Criveti <[email protected]>
1 parent fb2db9e commit bc2f512

File tree

11 files changed

+467
-4
lines changed

11 files changed

+467
-4
lines changed

.env.example

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -765,6 +765,45 @@ OTEL_BSP_MAX_QUEUE_SIZE=2048
765765
OTEL_BSP_MAX_EXPORT_BATCH_SIZE=512
766766
OTEL_BSP_SCHEDULE_DELAY=5000
767767

768+
# Prometheus Metrics Configuration
769+
# Enable Prometheus-compatible metrics exposition for monitoring and alerting
770+
# Options: true (default), false
771+
# When true: Exposes metrics at /metrics/prometheus in Prometheus format
772+
# When false: Returns HTTP 503 on metrics endpoint
773+
ENABLE_METRICS=true
774+
775+
# Comma-separated regex patterns for endpoints to exclude from metrics collection
776+
# Use this to avoid high-cardinality issues with dynamic paths or reduce overhead
777+
# Examples:
778+
# - Exclude SSE endpoints: /servers/.*/sse
779+
# - Exclude static files: /static/.*
780+
# - Exclude health checks: .*health.*
781+
# - Multiple patterns: /servers/.*/sse,/static/.*,.*health.*
782+
# Default: "" (no exclusions)
783+
METRICS_EXCLUDED_HANDLERS=
784+
785+
# Prometheus metrics namespace (prefix for all metric names)
786+
# Used to group metrics by application or organization
787+
# Example: mycompany_gateway_http_requests_total
788+
# Default: "default"
789+
METRICS_NAMESPACE=default
790+
791+
# Prometheus metrics subsystem (secondary prefix for metric names)
792+
# Used for further categorization within namespace
793+
# Example: mycompany_api_http_requests_total (if subsystem=api)
794+
# Default: "" (no subsystem)
795+
METRICS_SUBSYSTEM=
796+
797+
# Custom static labels for app_info gauge metric
798+
# Format: comma-separated "key=value" pairs (low-cardinality values only)
799+
# WARNING: Never use high-cardinality values (user IDs, request IDs, timestamps)
800+
# Examples:
801+
# - Single label: environment=production
802+
# - Multiple labels: environment=production,region=us-east-1,team=platform
803+
# - K8s example: cluster=prod-us-east,namespace=mcp-gateway
804+
# Default: "" (no custom labels)
805+
METRICS_CUSTOM_LABELS=
806+
768807
# Plugin Framework Configuration
769808
# Enable the plugin system for extending gateway functionality
770809
# Options: true, false (default)

charts/mcp-stack/templates/deployment-mcpgateway.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,23 @@ spec:
7474
- name: REDIS_PORT
7575
value: "{{ .Values.mcpContextForge.env.redis.port }}"
7676

77+
# ---------- METRICS ----------
78+
{{- if .Values.mcpContextForge.metrics.enabled }}
79+
- name: ENABLE_METRICS
80+
value: "{{ .Values.mcpContextForge.metrics.enabled }}"
81+
{{- if .Values.mcpContextForge.metrics.excludedHandlers }}
82+
- name: METRICS_EXCLUDED_HANDLERS
83+
value: "{{ .Values.mcpContextForge.metrics.excludedHandlers }}"
84+
{{- end }}
85+
{{- if .Values.mcpContextForge.metrics.customLabels }}
86+
- name: METRICS_CUSTOM_LABELS
87+
value: "{{ range $key, $value := .Values.mcpContextForge.metrics.customLabels }}{{ $key }}={{ $value }},{{ end }}"
88+
{{- end }}
89+
{{- else }}
90+
- name: ENABLE_METRICS
91+
value: "false"
92+
{{- end }}
93+
7794
# ---------- DERIVED URLS ----------
7895
# These MUST be placed *after* the concrete vars above so the
7996
# $(...) placeholders are expanded correctly inside the pod.

charts/mcp-stack/values.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,14 @@ mcpContextForge:
3939

4040
containerPort: 4444 # port the app listens on inside the pod
4141

42+
# Metrics configuration
43+
metrics:
44+
enabled: true
45+
port: 8000
46+
serviceMonitor:
47+
enabled: true
48+
customLabels: {}
49+
4250
# Health & readiness probes
4351
probes:
4452
startup:

docs/docs/manage/observability.md

Lines changed: 113 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Observability
1+
## Observability
22

3-
MCP Gateway includes production-grade OpenTelemetry instrumentation for distributed tracing, enabling you to monitor performance, debug issues, and understand request flows.
3+
MCP Gateway includes production-grade OpenTelemetry instrumentation for distributed tracing and Prometheus-compatible metrics exposure.
44

55
## Documentation
66

@@ -23,3 +23,114 @@ mcpgateway
2323
```
2424

2525
View traces at http://localhost:6006
26+
27+
## Prometheus metrics (important)
28+
29+
Note: the metrics exposure is wired from `mcpgateway/main.py` but the HTTP
30+
handler itself is registered by the metrics module. The main application
31+
imports and calls `setup_metrics(app)` from `mcpgateway.services.metrics`. The
32+
`setup_metrics` function instruments the FastAPI app and registers the
33+
Prometheus scrape endpoint using the Prometheus instrumentator; the endpoint
34+
available to Prometheus scrapers is:
35+
36+
- GET /metrics/prometheus
37+
38+
The route is created by `Instrumentator.expose` inside
39+
`mcpgateway/services/metrics.py` (not by manually adding a GET handler in
40+
`main.py`). The endpoint is registered with `include_in_schema=True` (so it
41+
appears in OpenAPI / Swagger) and gzip compression is enabled by default
42+
(`should_gzip=True`) for the exposition handler.
43+
44+
### Env vars / settings that control metrics
45+
46+
- `ENABLE_METRICS` (env) — set to `true` (default) to enable instrumentation; set `false` to disable.
47+
- `METRICS_EXCLUDED_HANDLERS` (env / settings) — comma-separated regexes for endpoints to exclude from instrumentation (useful for SSE/WS or per-request high-cardinality paths). The implementation reads `settings.METRICS_EXCLUDED_HANDLERS` and compiles the patterns.
48+
- `METRICS_CUSTOM_LABELS` (env / settings) — comma-separated `key=value` pairs used as static labels on the `app_info` gauge (low-cardinality values only). When present, a Prometheus `app_info` gauge is created and set to 1 with those labels.
49+
- Additional settings in `mcpgateway/config.py`: `METRICS_NAMESPACE`, `METRICS_SUBSYSTEM`. Note: these config fields exist, but the current `metrics` module does not wire them into the instrumentator by default (they're available for future use/consumption by custom collectors).
50+
51+
### Enable / verify locally
52+
53+
1. Ensure `ENABLE_METRICS=true` in your shell or `.env`.
54+
55+
```bash
56+
export ENABLE_METRICS=true
57+
export METRICS_CUSTOM_LABELS="env=local,team=dev"
58+
export METRICS_EXCLUDED_HANDLERS="/servers/.*/sse,/static/.*"
59+
```
60+
61+
2. Start the gateway (development). By default the app listens on port 4444. The Prometheus endpoint will be:
62+
63+
http://localhost:4444/metrics/prometheus
64+
65+
3. Quick check (get the first lines of exposition text):
66+
67+
```bash
68+
curl -sS http://localhost:4444/metrics/prometheus | head -n 20
69+
```
70+
71+
4. If metrics are disabled, the endpoint returns a small JSON 503 response.
72+
73+
### Prometheus scrape job example
74+
75+
Add the job below to your `prometheus.yml` for local testing:
76+
77+
```yaml
78+
scrape_configs:
79+
- job_name: 'mcp-gateway'
80+
metrics_path: /metrics/prometheus
81+
static_configs:
82+
- targets: ['localhost:4444']
83+
```
84+
85+
If Prometheus runs in Docker, adjust the target host accordingly (host networking
86+
or container host IP). See the repo `docs/manage/scale.md` for examples of
87+
deploying Prometheus in Kubernetes.
88+
89+
### Grafana and dashboards
90+
91+
- Use Grafana to import dashboards for Kubernetes, PostgreSQL and Redis (IDs
92+
suggested elsewhere in the repo). For MCP Gateway app metrics, create panels
93+
for:
94+
- Request rate: `rate(http_requests_total[1m])`
95+
- Error rate: `rate(http_requests_total{status=~"5.."}[5m])`
96+
- P99 latency: `histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))`
97+
98+
### Common pitfalls — short guidance
99+
100+
- High-cardinality labels
101+
- Never add per-request identifiers (user IDs, full URIs, request IDs) as
102+
Prometheus labels. They explode the number of time series and can crash
103+
Prometheus memory.
104+
- Use `METRICS_CUSTOM_LABELS` only for low-cardinality labels (env, region).
105+
106+
- Compression (gzip) vs CPU
107+
- The metrics exposer in `mcpgateway.services.metrics` enables gzip by
108+
default for the `/metrics/prometheus` endpoint. Compressing the payload
109+
reduces network usage but increases CPU on scrape time. On CPU-constrained
110+
nodes consider increasing scrape interval (e.g. 15s→30s) or disabling gzip
111+
at the instrumentor layer.
112+
113+
- Duplicate collectors during reloads/tests
114+
- Instrumentation registers collectors on the global Prometheus registry.
115+
When reloading the app in the same process (tests, interactive sessions)
116+
you may see "collector already registered"; restart the process or clear
117+
the registry in test fixtures.
118+
119+
### Quick checklist
120+
121+
- [ ] `ENABLE_METRICS=true`
122+
- [ ] `/metrics/prometheus` reachable
123+
- [ ] Add scrape job to Prometheus
124+
- [ ] Exclude high-cardinality paths with `METRICS_EXCLUDED_HANDLERS`
125+
- [ ] Use tracing (OTel) for high-cardinality debugging information
126+
127+
## Where to look in the code
128+
129+
- `mcpgateway/main.py` — wiring: imports and calls `setup_metrics(app)` from
130+
`mcpgateway.services.metrics`. The function call instruments the app at
131+
startup; the actual HTTP handler for `/metrics/prometheus` is registered by
132+
the `Instrumentator` inside `mcpgateway/services/metrics.py`.
133+
- `mcpgateway/services/metrics.py` — instrumentation implementation and env-vars.
134+
- `mcpgateway/config.py` — settings defaults and names used by the app.
135+
136+
---

mcpgateway/bootstrap_db.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,9 @@
3535
from typing import Any, cast
3636

3737
# Third-Party
38-
from sqlalchemy import create_engine, inspect
39-
4038
from alembic import command
4139
from alembic.config import Config
40+
from sqlalchemy import create_engine, inspect
4241

4342
# First-Party
4443
from mcpgateway.config import settings

mcpgateway/config.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1418,6 +1418,12 @@ def log_summary(self) -> None:
14181418
summary = self.model_dump(exclude={"database_url", "memcached_url"})
14191419
logger.info(f"Application settings summary: {summary}")
14201420

1421+
ENABLE_METRICS: bool = Field(True, description="Enable Prometheus metrics instrumentation")
1422+
METRICS_EXCLUDED_HANDLERS: str = Field("", description="Comma-separated regex patterns for paths to exclude from metrics")
1423+
METRICS_NAMESPACE: str = Field("default", description="Prometheus metrics namespace")
1424+
METRICS_SUBSYSTEM: str = Field("", description="Prometheus metrics subsystem")
1425+
METRICS_CUSTOM_LABELS: str = Field("", description='Comma-separated "key=value" pairs for static custom labels')
1426+
14211427

14221428
@lru_cache()
14231429
def get_settings() -> Settings:

mcpgateway/main.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@
108108
from mcpgateway.services.import_service import ImportError as ImportServiceError
109109
from mcpgateway.services.import_service import ImportService, ImportValidationError
110110
from mcpgateway.services.logging_service import LoggingService
111+
from mcpgateway.services.metrics import setup_metrics
111112
from mcpgateway.services.prompt_service import PromptError, PromptNameConflictError, PromptNotFoundError, PromptService
112113
from mcpgateway.services.resource_service import ResourceError, ResourceNotFoundError, ResourceService, ResourceURIConflictError
113114
from mcpgateway.services.root_service import RootService
@@ -479,6 +480,9 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
479480
default_response_class=ORJSONResponse, # Use orjson for high-performance JSON serialization
480481
)
481482

483+
# Setup metrics instrumentation
484+
setup_metrics(app)
485+
482486

483487
async def validate_security_configuration():
484488
"""Validate security configuration on startup."""

mcpgateway/services/metrics.py

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
Location: ./mcpgateway/services/metrics.py
4+
Copyright 2025
5+
SPDX-License-Identifier: Apache-2.0
6+
7+
MCP Gateway Metrics Service.
8+
9+
This module provides comprehensive Prometheus metrics instrumentation for the MCP Gateway.
10+
It configures and exposes HTTP metrics including request counts, latencies, response sizes,
11+
and custom application metrics.
12+
13+
The service automatically instruments FastAPI applications with standard HTTP metrics
14+
and provides configurable exclusion patterns for endpoints that should not be monitored.
15+
Metrics are exposed at the `/metrics/prometheus` endpoint in Prometheus format.
16+
17+
Supported Metrics:
18+
- http_requests_total: Counter for total HTTP requests by method, endpoint, and status
19+
- http_request_duration_seconds: Histogram of request processing times
20+
- http_request_size_bytes: Histogram of incoming request payload sizes
21+
- http_response_size_bytes: Histogram of outgoing response payload sizes
22+
- app_info: Gauge with custom static labels for application metadata
23+
24+
Environment Variables:
25+
- ENABLE_METRICS: Enable/disable metrics collection (default: "true")
26+
- METRICS_EXCLUDED_HANDLERS: Comma-separated regex patterns for excluded endpoints
27+
- METRICS_CUSTOM_LABELS: Custom labels for app_info gauge (format: "key1=value1,key2=value2")
28+
29+
Usage:
30+
from mcpgateway.services.metrics import setup_metrics
31+
32+
app = FastAPI()
33+
setup_metrics(app) # Automatically instruments the app
34+
35+
# Metrics available at: GET /metrics/prometheus
36+
37+
Functions:
38+
- setup_metrics: Configure Prometheus instrumentation for FastAPI app
39+
"""
40+
41+
# Standard
42+
import os
43+
import re
44+
45+
# Third-Party
46+
from fastapi import Response, status
47+
from prometheus_client import Gauge, REGISTRY
48+
from prometheus_fastapi_instrumentator import Instrumentator
49+
50+
# First-Party
51+
from mcpgateway.config import settings
52+
53+
54+
def setup_metrics(app):
55+
"""
56+
Configure Prometheus metrics instrumentation for a FastAPI application.
57+
58+
This function sets up comprehensive HTTP metrics collection including request counts,
59+
latencies, and payload sizes. It also handles custom application labels and endpoint
60+
exclusion patterns.
61+
62+
Args:
63+
app: FastAPI application instance to instrument
64+
65+
Environment Variables Used:
66+
ENABLE_METRICS (str): "true" to enable metrics, "false" to disable (default: "true")
67+
METRICS_EXCLUDED_HANDLERS (str): Comma-separated regex patterns for endpoints
68+
to exclude from metrics collection
69+
METRICS_CUSTOM_LABELS (str): Custom labels in "key1=value1,key2=value2" format
70+
for the app_info gauge metric
71+
72+
Side Effects:
73+
- Registers Prometheus metrics collectors with the global registry
74+
- Adds middleware to the FastAPI app for request instrumentation
75+
- Exposes /metrics/prometheus endpoint for Prometheus scraping
76+
- Prints status messages to stdout
77+
78+
Example:
79+
>>> from fastapi import FastAPI
80+
>>> from mcpgateway.services.metrics import setup_metrics
81+
>>> app = FastAPI()
82+
>>> # setup_metrics(app) # Configures Prometheus metrics
83+
>>> # Metrics available at GET /metrics/prometheus
84+
"""
85+
enable_metrics = os.getenv("ENABLE_METRICS", "true").lower() == "true"
86+
87+
if enable_metrics:
88+
# Custom labels gauge
89+
custom_labels = dict(kv.split("=") for kv in os.getenv("METRICS_CUSTOM_LABELS", "").split(",") if "=" in kv)
90+
if custom_labels:
91+
app_info_gauge = Gauge(
92+
"app_info",
93+
"Static labels for the application",
94+
labelnames=list(custom_labels.keys()),
95+
registry=REGISTRY,
96+
)
97+
app_info_gauge.labels(**custom_labels).set(1)
98+
99+
excluded = [pattern.strip() for pattern in (settings.METRICS_EXCLUDED_HANDLERS or "").split(",") if pattern.strip()]
100+
101+
# Create instrumentator instance
102+
instrumentator = Instrumentator(
103+
should_group_status_codes=False,
104+
should_ignore_untemplated=True,
105+
excluded_handlers=[re.compile(p) for p in excluded],
106+
)
107+
108+
# Instrument FastAPI app
109+
instrumentator.instrument(app)
110+
111+
# Expose Prometheus metrics at /metrics/prometheus and include
112+
# the endpoint in the OpenAPI schema so it appears in Swagger UI.
113+
instrumentator.expose(app, endpoint="/metrics/prometheus", include_in_schema=True, should_gzip=True)
114+
115+
print("✅ Metrics instrumentation enabled")
116+
else:
117+
print("⚠️ Metrics instrumentation disabled")
118+
119+
@app.get("/metrics/prometheus")
120+
async def metrics_disabled():
121+
return Response(content='{"error": "Metrics collection is disabled"}', media_type="application/json", status_code=status.HTTP_503_SERVICE_UNAVAILABLE)

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ dependencies = [
6868
"pydantic>=2.12.3",
6969
"pydantic[email]>=2.12.3",
7070
"pydantic-settings>=2.11.0",
71+
"prometheus_client>=0.16.0",
72+
"prometheus-fastapi-instrumentator>=7.0.0",
7173
"pyjwt>=2.10.1",
7274
"python-json-logger>=4.0.0",
7375
"PyYAML>=6.0.3",

0 commit comments

Comments
 (0)