-
Notifications
You must be signed in to change notification settings - Fork 156
refactor: separate embedding metrics to remove metrics which does not make sense for embeddings #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (79.37%) is below the target coverage (86.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1206 +/- ##
==========================================
- Coverage 79.38% 79.37% -0.02%
==========================================
Files 88 88
Lines 10067 10080 +13
==========================================
+ Hits 7992 8001 +9
- Misses 1712 1717 +5
+ Partials 363 362 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I think this PR is no longer necessary as even if the metrics are there it is only used in chatCompletion not embeddings. double check me if I'm wrong. Regardless we can test this better soon. |
|
after #1204 we have an easy way to ad-hoc test for this (though without it we can still hit the /metrics endpoint and look) if you run the CLI (standalone mode), you can do |
|
@codecov-commenter When we deploy the embedding models, we got metrics like |
|
@hustxiayang so I think the main way to get on the same page is to discuss things in terms of what code is in main right now, not what 0.3 did. As far as I can tell, there is no "first token" latency metrics for embeddings in main any more, which is a part of this PR. If there were tests added to this PR you would be able to verify what I'm saying as true or false. Also, "total" is removed now, so I think that is also something that the above discussion is out of date with. Separately, there seems to be a concern if tokens should be a metric for the embeddings endpoint. Perhaps the naming should change but token is a billing and rate-limiting metric regardless of whether we are talking LLM or embeddings. For example, if we remove tokens completely from the embeddings endpoint we will remove our ability to know or control based on things folks are billed on https://docs.langdock.com/api-endpoints/embedding/openai-embedding I would suggest that first rebase this on main, then use tests as a way to prove the before and after behavior of what this PR aims to do. Without it, we get stuck in interpretations which may no longer be valid. Make sense? |
|
meanwhile I will refactor the tests so that embeddings metrics are tested. it is possible we aren't settng the operation name correctly per https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/#metric-gen_aiclienttokenusage so for chat completions gen_ai.operation.name=chat, and embeddings, gen_ai.operation.name=embeddings If this isn't set correctly then the metrics would be conflated and nonsense, so if that's the case it is very much a bug! |
|
#1219 verifies the old code indeed split metrics on op name. |
|
Let's reopen if this needs to be revived from the conflicts! |
Description
It does not make sense to include
firstTokenLatencyandoutputTokenLatencymetrics for embedding models. Actually, embedding models are not generative models, thus need to separate them.