refactor: separate embedding metrics to remove metrics which does not make sense for embeddings #1206

hustxiayang · 2025-09-16T22:21:53Z

Description

It does not make sense to include firstTokenLatency and outputTokenLatency metrics for embedding models. Actually, embedding models are not generative models, thus need to separate them.

Signed-off-by: yxia216 <yxia216@bloomberg.net>

codecov-commenter · 2025-09-16T22:29:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.37%. Comparing base (0b97426) to head (3e1e032).

❌ Your project status has failed because the head coverage (79.37%) is below the target coverage (86.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1206      +/-   ##
==========================================
- Coverage   79.38%   79.37%   -0.02%     
==========================================
  Files          88       88              
  Lines       10067    10080      +13     
==========================================
+ Hits         7992     8001       +9     
- Misses       1712     1717       +5     
+ Partials      363      362       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codefromthecrypt · 2025-09-17T01:50:10Z

I think this PR is no longer necessary as even if the metrics are there it is only used in chatCompletion not embeddings. double check me if I'm wrong. Regardless we can test this better soon.

codefromthecrypt · 2025-09-17T01:56:14Z

after #1204 we have an easy way to ad-hoc test for this (though without it we can still hit the /metrics endpoint and look) if you run the CLI (standalone mode), you can do OTEL_METRICS_EXPORTER=console OTEL_METRIC_EXPORT_INTERVAL=100 aigw ... then make whatever the requests you want. the metrics will show up on the console after 100ms.

hustxiayang · 2025-09-18T14:01:16Z

@codecov-commenter When we deploy the embedding models, we got metrics like "prompt_tokens":7,"total_tokens":7,"completion_tokens":0,"prompt_tokens_details":null and our users complain that the metrics are not correct as it does not make any sense to see completion_tokens:0 as embedding models are not even generative. That' the motivation of this PR.

codefromthecrypt · 2025-09-18T22:17:04Z

@hustxiayang so I think the main way to get on the same page is to discuss things in terms of what code is in main right now, not what 0.3 did.

As far as I can tell, there is no "first token" latency metrics for embeddings in main any more, which is a part of this PR. If there were tests added to this PR you would be able to verify what I'm saying as true or false.

Also, "total" is removed now, so I think that is also something that the above discussion is out of date with.

Separately, there seems to be a concern if tokens should be a metric for the embeddings endpoint. Perhaps the naming should change but token is a billing and rate-limiting metric regardless of whether we are talking LLM or embeddings. For example, if we remove tokens completely from the embeddings endpoint we will remove our ability to know or control based on things folks are billed on https://docs.langdock.com/api-endpoints/embedding/openai-embedding

I would suggest that first rebase this on main, then use tests as a way to prove the before and after behavior of what this PR aims to do. Without it, we get stuck in interpretations which may no longer be valid.

Make sense?

codefromthecrypt · 2025-09-18T22:21:42Z

meanwhile I will refactor the tests so that embeddings metrics are tested. it is possible we aren't settng the operation name correctly per https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/#metric-gen_aiclienttokenusage

so for chat completions gen_ai.operation.name=chat, and embeddings, gen_ai.operation.name=embeddings

If this isn't set correctly then the metrics would be conflated and nonsense, so if that's the case it is very much a bug!

codefromthecrypt · 2025-09-19T01:39:04Z

#1219 verifies the old code indeed split metrics on op name.

mathetake · 2025-10-10T15:56:08Z

Let's reopen if this needs to be revived from the conflicts!

init

3e1e032

Signed-off-by: yxia216 <yxia216@bloomberg.net>

hustxiayang requested a review from a team as a code owner September 16, 2025 22:21

hustxiayang changed the title ~~init~~ refactor: separate embedding metrics to remove metrics which does not make sense Sep 16, 2025

hustxiayang changed the title ~~refactor: separate embedding metrics to remove metrics which does not make sense~~ refactor: separate embedding metrics to remove metrics which does not make sense for embeddings Sep 16, 2025

mathetake closed this Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: separate embedding metrics to remove metrics which does not make sense for embeddings #1206

refactor: separate embedding metrics to remove metrics which does not make sense for embeddings #1206

Uh oh!

hustxiayang commented Sep 16, 2025

Uh oh!

codecov-commenter commented Sep 16, 2025

Uh oh!

codefromthecrypt commented Sep 17, 2025

Uh oh!

codefromthecrypt commented Sep 17, 2025

Uh oh!

hustxiayang commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 19, 2025

Uh oh!

mathetake commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

refactor: separate embedding metrics to remove metrics which does not make sense for embeddings #1206

refactor: separate embedding metrics to remove metrics which does not make sense for embeddings #1206

Uh oh!

Conversation

hustxiayang commented Sep 16, 2025

Uh oh!

codecov-commenter commented Sep 16, 2025

Codecov Report

Uh oh!

codefromthecrypt commented Sep 17, 2025

Uh oh!

codefromthecrypt commented Sep 17, 2025

Uh oh!

hustxiayang commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 18, 2025

Uh oh!

codefromthecrypt commented Sep 19, 2025

Uh oh!

mathetake commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants