Skip to content

Conversation

@codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented Sep 16, 2025

Description

This extends otel ENV config to metrics, when enabled exports instead of a prometheus listener. This allows metrics in otel native systems like elastic stack, otel tui and otherwise, without a prometheus pump. This also allows you to do ad-hoc configuration by using console config.

Here's an example of standalone using otel-tui https://github.com/ymtdzzz/otel-tui:
tui-metrics
tui-trace

Updated cmd/aigw examples:

  • Add otel-tui service to show native otel w/o prometheus
  • Use profile-based env file selection (.env.otel.${COMPOSE_PROFILES:-console})
  • Set console as default for immediate debugging without external dependencies

Documentation:

  • Document separate OTLP endpoints for traces/metrics and exporter types

Phoenix users: Set OTEL_METRICS_EXPORTER=none as Phoenix only supports traces

Fixes #1100

@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2025

Codecov Report

❌ Patch coverage is 82.08955% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.36%. Comparing base (0b97426) to head (84fbab7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/metrics/metrics.go 77.77% 8 Missing and 4 partials ⚠️

❌ Your project status has failed because the head coverage (79.36%) is below the target coverage (86.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1204      +/-   ##
==========================================
- Coverage   79.38%   79.36%   -0.03%     
==========================================
  Files          88       89       +1     
  Lines       10067    10131      +64     
==========================================
+ Hits         7992     8040      +48     
- Misses       1712     1725      +13     
- Partials      363      366       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codefromthecrypt
Copy link
Contributor Author

Tested with ElasticStack here, so I think we're good elastic/observability-examples#86

@codefromthecrypt
Copy link
Contributor Author

triple checked the README everything is smooth

@codefromthecrypt
Copy link
Contributor Author

make test-extproc is flakey.. trying to figure out why (sometimes passes)

@codefromthecrypt codefromthecrypt marked this pull request as draft September 16, 2025 06:18
@codefromthecrypt
Copy link
Contributor Author

taking to draft because I think it isn't exposing prom by default as it was before. even if things work I suspect we want to default to that and make sure everything works exactly as before unless you set an override

@codefromthecrypt
Copy link
Contributor Author

hopefully last change fixes the flake where other tests making a request were required to have the one that flaked pass (race condition)

@codefromthecrypt
Copy link
Contributor Author

Something else is flakey not sure if I will be able to solve in this PR

@codefromthecrypt codefromthecrypt marked this pull request as ready for review September 16, 2025 07:53
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the code and seems straightforward vs unlocking a new use case which is exciting!

@mathetake
Copy link
Member

2025/09/16 07:22:47 httptest.Server blocked in Close after 5 seconds, waiting for connections:
  *net.TCPConn 0xc000610650 127.0.0.1:40452 in state active
panic: test timed out after 10m0s
	running tests:
		TestOtelOpenAIChatCompletions_propagation (9m37s)

this seems flaky @codefromthecrypt

@codefromthecrypt
Copy link
Contributor Author

@mathetake refactored the prom metrics test to not be flakey TestOtelOpenAIChatCompletions_propagation I'm racing locally and if can find something will do another PR

@codefromthecrypt
Copy link
Contributor Author

ok 🤞

}

// TestNewMetricsFromEnv_NetworkExporters tests OTLP and other network-based exporters.
// We CANNOT use synctest here because it creates a "bubble" where goroutines are isolated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps I intentionally left this in as it is something I will discuss in the golang meetup tomorrow ;)

This extends otel ENV config to metrics, when enabled exports instead of a
prometheus listener. This allows metrics in otel native systems like elastic stack,
otel tui and otherwise, without a prometheus pump. This also allows you to do
ad-hoc configuration by using `console` config.

Updated cmd/aigw examples:
- Add otel-tui service to show native otel w/o prometheus
- Use profile-based env file selection (.env.otel.${COMPOSE_PROFILES:-console})
- Set console as default for immediate debugging without external dependencies

Documentation:
- Document separate OTLP endpoints for traces/metrics and exporter types

Phoenix users: Set OTEL_METRICS_EXPORTER=none as Phoenix only supports traces

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
// TestPrometheusMetrics verifies that metrics are properly exported via Prometheus
// when processing a chat completion request. This test uses the default configuration
// which exposes metrics on the /metrics endpoint.
func TestPrometheusMetrics(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hustxiayang fyi once this lands, feel free to refactor it or rename it to TestPromOpenAIChatCompletions so you can have a second TestPromOpenAIEmbeddings that verifies the token things aren't there (or remain not there). another way is to make this a table test, but the parameters may be difficult enough to warrant a separate test for embeddings. not sure.. anyway hope this base test helps!

@mathetake mathetake merged commit 58a130b into envoyproxy:main Sep 17, 2025
28 checks passed
missBerg pushed a commit to missBerg/ai-gateway that referenced this pull request Dec 20, 2025
**Description**

This extends otel ENV config to metrics, when enabled exports instead of
a prometheus listener. This allows metrics in otel native systems like
elastic stack, otel tui and otherwise, without a prometheus pump. This
also allows you to do ad-hoc configuration by using `console` config.

Here's an example of standalone using otel-tui
https://github.com/ymtdzzz/otel-tui:
<img width="720" height="296" alt="tui-metrics"
src="https://github.com/user-attachments/assets/6640e6ee-c87c-4586-bce7-13d6ca837c96"
/>
<img width="2096" height="1014" alt="tui-trace"
src="https://github.com/user-attachments/assets/0feb6ed2-87e0-4b43-9c96-7f09493ec365"
/>

Updated cmd/aigw examples:
- Add otel-tui service to show native otel w/o prometheus
- Use profile-based env file selection
(.env.otel.${COMPOSE_PROFILES:-console})
- Set console as default for immediate debugging without external
dependencies

Documentation:
- Document separate OTLP endpoints for traces/metrics and exporter types

Phoenix users: Set OTEL_METRICS_EXPORTER=none as Phoenix only supports
traces

Fixes envoyproxy#1100

---------

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

otel native metrics export

3 participants