Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions charts/mcp-stack/templates/deployment-mcpgateway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,23 @@ spec:
- name: REDIS_PORT
value: "{{ .Values.mcpContextForge.env.redis.port }}"

# ---------- METRICS ----------
{{- if .Values.mcpContextForge.metrics.enabled }}
- name: ENABLE_METRICS
value: "{{ .Values.mcpContextForge.metrics.enabled }}"
{{- if .Values.mcpContextForge.metrics.excludedHandlers }}
- name: METRICS_EXCLUDED_HANDLERS
value: "{{ .Values.mcpContextForge.metrics.excludedHandlers }}"
{{- end }}
{{- if .Values.mcpContextForge.metrics.customLabels }}
- name: METRICS_CUSTOM_LABELS
value: "{{ range $key, $value := .Values.mcpContextForge.metrics.customLabels }}{{ $key }}={{ $value }},{{ end }}"
{{- end }}
{{- else }}
- name: ENABLE_METRICS
value: "false"
{{- end }}

# ---------- DERIVED URLS ----------
# These MUST be placed *after* the concrete vars above so the
# $(...) placeholders are expanded correctly inside the pod.
Expand Down
8 changes: 8 additions & 0 deletions charts/mcp-stack/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ mcpContextForge:

containerPort: 4444 # port the app listens on inside the pod

# Metrics configuration
metrics:
enabled: true
port: 8000
serviceMonitor:
enabled: true
customLabels: {}

# Health & readiness probes
probes:
startup:
Expand Down
115 changes: 113 additions & 2 deletions docs/docs/manage/observability.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Observability
## Observability

MCP Gateway includes production-grade OpenTelemetry instrumentation for distributed tracing, enabling you to monitor performance, debug issues, and understand request flows.
MCP Gateway includes production-grade OpenTelemetry instrumentation for distributed tracing and Prometheus-compatible metrics exposure.

## Documentation

Expand All @@ -23,3 +23,114 @@ mcpgateway
```

View traces at http://localhost:6006

## Prometheus metrics (important)

Note: the metrics exposure is wired from `mcpgateway/main.py` but the HTTP
handler itself is registered by the metrics module. The main application
imports and calls `setup_metrics(app)` from `mcpgateway.services.metrics`. The
`setup_metrics` function instruments the FastAPI app and registers the
Prometheus scrape endpoint using the Prometheus instrumentator; the endpoint
available to Prometheus scrapers is:

- GET /metrics/prometheus

The route is created by `Instrumentator.expose` inside
`mcpgateway/services/metrics.py` (not by manually adding a GET handler in
`main.py`). The endpoint is registered with `include_in_schema=True` (so it
appears in OpenAPI / Swagger) and gzip compression is enabled by default
(`should_gzip=True`) for the exposition handler.

### Env vars / settings that control metrics

- `ENABLE_METRICS` (env) — set to `true` (default) to enable instrumentation; set `false` to disable.
- `METRICS_EXCLUDED_HANDLERS` (env / settings) — comma-separated regexes for endpoints to exclude from instrumentation (useful for SSE/WS or per-request high-cardinality paths). The implementation reads `settings.METRICS_EXCLUDED_HANDLERS` and compiles the patterns.
- `METRICS_CUSTOM_LABELS` (env / settings) — comma-separated `key=value` pairs used as static labels on the `app_info` gauge (low-cardinality values only). When present, a Prometheus `app_info` gauge is created and set to 1 with those labels.
- Additional settings in `mcpgateway/config.py`: `METRICS_NAMESPACE`, `METRICS_SUBSYSTEM`. Note: these config fields exist, but the current `metrics` module does not wire them into the instrumentator by default (they're available for future use/consumption by custom collectors).

### Enable / verify locally

1. Ensure `ENABLE_METRICS=true` in your shell or `.env`.

```bash
export ENABLE_METRICS=true
export METRICS_CUSTOM_LABELS="env=local,team=dev"
export METRICS_EXCLUDED_HANDLERS="/servers/.*/sse,/static/.*"
```

2. Start the gateway (development). By default the app listens on port 4444. The Prometheus endpoint will be:

http://localhost:4444/metrics/prometheus

3. Quick check (get the first lines of exposition text):

```bash
curl -sS http://localhost:4444/metrics/prometheus | head -n 20
```

4. If metrics are disabled, the endpoint returns a small JSON 503 response.

### Prometheus scrape job example

Add the job below to your `prometheus.yml` for local testing:

```yaml
scrape_configs:
- job_name: 'mcp-gateway'
metrics_path: /metrics/prometheus
static_configs:
- targets: ['localhost:4444']
```

If Prometheus runs in Docker, adjust the target host accordingly (host networking
or container host IP). See the repo `docs/manage/scale.md` for examples of
deploying Prometheus in Kubernetes.

### Grafana and dashboards

- Use Grafana to import dashboards for Kubernetes, PostgreSQL and Redis (IDs
suggested elsewhere in the repo). For MCP Gateway app metrics, create panels
for:
- Request rate: `rate(http_requests_total[1m])`
- Error rate: `rate(http_requests_total{status=~"5.."}[5m])`
- P99 latency: `histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))`

### Common pitfalls — short guidance

- High-cardinality labels
- Never add per-request identifiers (user IDs, full URIs, request IDs) as
Prometheus labels. They explode the number of time series and can crash
Prometheus memory.
- Use `METRICS_CUSTOM_LABELS` only for low-cardinality labels (env, region).

- Compression (gzip) vs CPU
- The metrics exposer in `mcpgateway.services.metrics` enables gzip by
default for the `/metrics/prometheus` endpoint. Compressing the payload
reduces network usage but increases CPU on scrape time. On CPU-constrained
nodes consider increasing scrape interval (e.g. 15s→30s) or disabling gzip
at the instrumentor layer.

- Duplicate collectors during reloads/tests
- Instrumentation registers collectors on the global Prometheus registry.
When reloading the app in the same process (tests, interactive sessions)
you may see "collector already registered"; restart the process or clear
the registry in test fixtures.

### Quick checklist

- [ ] `ENABLE_METRICS=true`
- [ ] `/metrics/prometheus` reachable
- [ ] Add scrape job to Prometheus
- [ ] Exclude high-cardinality paths with `METRICS_EXCLUDED_HANDLERS`
- [ ] Use tracing (OTel) for high-cardinality debugging information

## Where to look in the code

- `mcpgateway/main.py` — wiring: imports and calls `setup_metrics(app)` from
`mcpgateway.services.metrics`. The function call instruments the app at
startup; the actual HTTP handler for `/metrics/prometheus` is registered by
the `Instrumentator` inside `mcpgateway/services/metrics.py`.
- `mcpgateway/services/metrics.py` — instrumentation implementation and env-vars.
- `mcpgateway/config.py` — settings defaults and names used by the app.

---
6 changes: 6 additions & 0 deletions mcpgateway/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1373,6 +1373,12 @@ def log_summary(self):
summary = self.model_dump(exclude={"database_url", "memcached_url"})
logger.info(f"Application settings summary: {summary}")

ENABLE_METRICS: bool = Field(True, description="Enable Prometheus metrics instrumentation")
METRICS_EXCLUDED_HANDLERS: str = Field("", description="Comma-separated regex patterns for paths to exclude from metrics")
METRICS_NAMESPACE: str = Field("default", description="Prometheus metrics namespace")
METRICS_SUBSYSTEM: str = Field("", description="Prometheus metrics subsystem")
METRICS_CUSTOM_LABELS: str = Field("", description='Comma-separated "key=value" pairs for static custom labels')


def extract_using_jq(data, jq_filter=""):
"""
Expand Down
4 changes: 4 additions & 0 deletions mcpgateway/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@
from mcpgateway.services.import_service import ImportError as ImportServiceError
from mcpgateway.services.import_service import ImportService, ImportValidationError
from mcpgateway.services.logging_service import LoggingService
from mcpgateway.services.metrics import setup_metrics
from mcpgateway.services.prompt_service import PromptError, PromptNameConflictError, PromptNotFoundError, PromptService
from mcpgateway.services.resource_service import ResourceError, ResourceNotFoundError, ResourceService, ResourceURIConflictError
from mcpgateway.services.root_service import RootService
Expand Down Expand Up @@ -408,6 +409,9 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
default_response_class=ORJSONResponse, # Use orjson for high-performance JSON serialization
)

# Setup metrics instrumentation
setup_metrics(app)


async def validate_security_configuration():
"""Validate security configuration on startup."""
Expand Down
121 changes: 121 additions & 0 deletions mcpgateway/services/metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# -*- coding: utf-8 -*-
"""
Location: ./mcpgateway/services/metrics.py
Copyright 2025
SPDX-License-Identifier: Apache-2.0

MCP Gateway Metrics Service.

This module provides comprehensive Prometheus metrics instrumentation for the MCP Gateway.
It configures and exposes HTTP metrics including request counts, latencies, response sizes,
and custom application metrics.

The service automatically instruments FastAPI applications with standard HTTP metrics
and provides configurable exclusion patterns for endpoints that should not be monitored.
Metrics are exposed at the `/metrics/prometheus` endpoint in Prometheus format.

Supported Metrics:
- http_requests_total: Counter for total HTTP requests by method, endpoint, and status
- http_request_duration_seconds: Histogram of request processing times
- http_request_size_bytes: Histogram of incoming request payload sizes
- http_response_size_bytes: Histogram of outgoing response payload sizes
- app_info: Gauge with custom static labels for application metadata

Environment Variables:
- ENABLE_METRICS: Enable/disable metrics collection (default: "true")
- METRICS_EXCLUDED_HANDLERS: Comma-separated regex patterns for excluded endpoints
- METRICS_CUSTOM_LABELS: Custom labels for app_info gauge (format: "key1=value1,key2=value2")

Usage:
from mcpgateway.services.metrics import setup_metrics

app = FastAPI()
setup_metrics(app) # Automatically instruments the app

# Metrics available at: GET /metrics/prometheus

Functions:
- setup_metrics: Configure Prometheus instrumentation for FastAPI app
"""

# Standard
import os
import re

# Third-Party
from fastapi import Response, status
from prometheus_client import Gauge, REGISTRY
from prometheus_fastapi_instrumentator import Instrumentator

# First-Party
from mcpgateway.config import settings


def setup_metrics(app):
"""
Configure Prometheus metrics instrumentation for a FastAPI application.

This function sets up comprehensive HTTP metrics collection including request counts,
latencies, and payload sizes. It also handles custom application labels and endpoint
exclusion patterns.

Args:
app: FastAPI application instance to instrument

Environment Variables Used:
ENABLE_METRICS (str): "true" to enable metrics, "false" to disable (default: "true")
METRICS_EXCLUDED_HANDLERS (str): Comma-separated regex patterns for endpoints
to exclude from metrics collection
METRICS_CUSTOM_LABELS (str): Custom labels in "key1=value1,key2=value2" format
for the app_info gauge metric

Side Effects:
- Registers Prometheus metrics collectors with the global registry
- Adds middleware to the FastAPI app for request instrumentation
- Exposes /metrics/prometheus endpoint for Prometheus scraping
- Prints status messages to stdout

Example:
>>> from fastapi import FastAPI
>>> from mcpgateway.services.metrics import setup_metrics
>>> app = FastAPI()
>>> # setup_metrics(app) # Configures Prometheus metrics
>>> # Metrics available at GET /metrics/prometheus
"""
enable_metrics = os.getenv("ENABLE_METRICS", "true").lower() == "true"

if enable_metrics:
# Custom labels gauge
custom_labels = dict(kv.split("=") for kv in os.getenv("METRICS_CUSTOM_LABELS", "").split(",") if "=" in kv)
if custom_labels:
app_info_gauge = Gauge(
"app_info",
"Static labels for the application",
labelnames=list(custom_labels.keys()),
registry=REGISTRY,
)
app_info_gauge.labels(**custom_labels).set(1)

excluded = [pattern.strip() for pattern in (settings.METRICS_EXCLUDED_HANDLERS or "").split(",") if pattern.strip()]

# Create instrumentator instance
instrumentator = Instrumentator(
should_group_status_codes=False,
should_ignore_untemplated=True,
excluded_handlers=[re.compile(p) for p in excluded],
)

# Instrument FastAPI app
instrumentator.instrument(app)

# Expose Prometheus metrics at /metrics/prometheus and include
# the endpoint in the OpenAPI schema so it appears in Swagger UI.
instrumentator.expose(app, endpoint="/metrics/prometheus", include_in_schema=True, should_gzip=True)

print("✅ Metrics instrumentation enabled")
else:
print("⚠️ Metrics instrumentation disabled")

@app.get("/metrics/prometheus")
async def metrics_disabled():
return Response(content='{"error": "Metrics collection is disabled"}', media_type="application/json", status_code=status.HTTP_503_SERVICE_UNAVAILABLE)
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ dependencies = [
"pydantic>=2.12.3",
"pydantic[email]>=2.12.3",
"pydantic-settings>=2.11.0",
"prometheus_client>=0.16.0",
"prometheus-fastapi-instrumentator>=7.0.0",
"pyjwt>=2.10.1",
"python-json-logger>=4.0.0",
"PyYAML>=6.0.3",
Expand Down
Loading
Loading