Extract Input/Output token usage from request. #111

sukumargaonkar · 2025-01-16T22:37:53Z

Currently only total_tokens usage from response body are pushed to dynamicMetadata. This PR updates that logic to include input and output token usage as well.

This PR also introduces monitorContinuousUsageStats flag in config for external process
The flag controls if external process monitors every response-body chunk for usage stats
when true, it will monitor for token metadata usage in every response-body chunk received during request in streaming mode (compatible with vllm's 'continuous_usage_stats' flag)
when false, it will stop monitoring after detecting token metadata usage after finding it for the first time.
(compatible with OpenAI's streaming response (https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage))
Only affects request in streaming mode

Currently only total_tokens usage from response body are pushed to dynamicMetadata. This PR updates that logic to include input and output token usage as well. Signed-off-by: Sukumar Gaonkar <[email protected]>

…useage # Conflicts: # internal/extproc/translator/openai_awsbedrock.go # internal/extproc/translator/openai_openai.go

Signed-off-by: Sukumar Gaonkar <[email protected]>

mathetake · 2025-01-16T23:08:06Z

filterconfig/filterconfig.go

+	// MonitorContinuousUsageStats flag controls if external process monitors every response-body chunk for usage stats
+	// when true, it will monitor for token metadata usage in every response-body chunk received during request in streaming mode
+	// compatible with vllm's 'continuous_usage_stats' flag
+	// when false, it will stop monitoring after detecting token metadata usage after finding it for the first time.
+	// compatible with OpenAI's streaming response (https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage)
+	// Only affects request in streaming mode
+	MonitorContinuousUsageStats bool `yaml:"monitorContinuousUsageStats,omitempty"`


could you remove the change related to this? I think this is another issue and metadata is not cumulative so basically it's overriding previous ones if it's emitted in the middle.

mathetake · 2025-01-16T23:09:01Z

internal/extproc/translator/translator.go

@@ -65,7 +71,7 @@ type Translator interface {
 	ResponseBody(body io.Reader, endOfStream bool) (
 		headerMutation *extprocv3.HeaderMutation,
 		bodyMutation *extprocv3.BodyMutation,
-		usedToken uint32,
+		tokenUsage *TokenUsage,


i don't think this three uint64 would be worth the allocations, so could you return the value

Suggested change

tokenUsage *TokenUsage,

tokenUsage TokenUsage,

kept it as a pointer so that nil value would indicate absence of tokenUsage data in the responseBody
if tokenUsage == nil {...}
with struct will have to check zero value for one of the member
if tokenUsage.TotalTokens == 0 {...}

thoughts?

mathetake · 2025-01-16T23:09:29Z

internal/extproc/translator/translator.go

+type TokenUsage struct {
+	InputTokens  uint32
+	OutputTokens uint32
+	TotalTokens  uint32
+}
+


Any public structs/fields/methods would need comments.

mathetake · 2025-01-16T23:13:04Z

.gitignore

@@ -29,3 +29,4 @@ site/yarn-debug.log*
 site/yarn-error.log*
 site/static/.DS_Store
 site/temp
+/.idea/


unnecessary change - already in here

also you can have a global gitignore in your os... that's where people usually place the gitignore for something that's not generated by the project

mathetake · 2025-01-16T23:50:03Z

sorry this feels a conflict with #103 i would appreciate it if you could stop it until it lands - I think the PR will supersede this PR besides vllm stuff part

mathetake · 2025-01-18T19:38:27Z

@sukumargaonkar thank you for waiting - #103 has landed so could you rework the PR and focus on the vllm stuff?

mathetake · 2025-01-21T20:01:01Z

ping

Extract Input/Output token usage from request.

e301800

Currently only total_tokens usage from response body are pushed to dynamicMetadata. This PR updates that logic to include input and output token usage as well. Signed-off-by: Sukumar Gaonkar <[email protected]>

sukumargaonkar requested a review from a team as a code owner January 16, 2025 22:37

sukumargaonkar added 2 commits January 16, 2025 17:53

Merge remote-tracking branch 'refs/remotes/upstream/main' into token-…

2bbf807

…useage # Conflicts: # internal/extproc/translator/openai_awsbedrock.go # internal/extproc/translator/openai_openai.go

Fix linting and formatting

5843f1e

Signed-off-by: Sukumar Gaonkar <[email protected]>

mathetake reviewed Jan 16, 2025

View reviewed changes

mathetake mentioned this pull request Jan 17, 2025

api: RequestCost configurations #103

Merged

Merge branch 'main' into token-useage

9a4ad44

mathetake self-assigned this Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract Input/Output token usage from request. #111

Extract Input/Output token usage from request. #111

sukumargaonkar commented Jan 16, 2025

mathetake Jan 16, 2025 •

edited

Loading

mathetake Jan 16, 2025 •

edited

Loading

sukumargaonkar Jan 16, 2025

mathetake Jan 16, 2025

mathetake Jan 16, 2025

mathetake Jan 16, 2025

mathetake commented Jan 16, 2025 •

edited

Loading

mathetake commented Jan 18, 2025

mathetake commented Jan 21, 2025

Extract Input/Output token usage from request. #111

Are you sure you want to change the base?

Extract Input/Output token usage from request. #111

Conversation

sukumargaonkar commented Jan 16, 2025

mathetake Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

mathetake Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

sukumargaonkar Jan 16, 2025

Choose a reason for hiding this comment

mathetake Jan 16, 2025

Choose a reason for hiding this comment

mathetake Jan 16, 2025

Choose a reason for hiding this comment

mathetake Jan 16, 2025

Choose a reason for hiding this comment

mathetake commented Jan 16, 2025 • edited Loading

mathetake commented Jan 18, 2025

mathetake commented Jan 21, 2025

mathetake Jan 16, 2025 •

edited

Loading

mathetake Jan 16, 2025 •

edited

Loading

mathetake commented Jan 16, 2025 •

edited

Loading