Bug Description
The gateway uses a naive token estimation based on character count:
prompt_tokens = len(input_str) // 4
This is a medium severity issue because:
- Inaccurate counting: Different characters/words have varying token lengths
- Cost calculation errors: Billing estimates will be incorrect
- Model compatibility: Different models use different tokenization
Severity
- Severity: Medium
- Impact: Incorrect token usage reporting and billing
Recommended Fix
- Use the
tiktoken library for accurate token counting:
- Implement proper tokenization:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
def count_tokens(text: str) -> int:
return len(enc.encode(text))
prompt_tokens = count_tokens(input_str)
- Fallback for when tiktoken is unavailable:
def estimate_tokens(text: str) -> int:
# Rough approximation for non-English text
# Average: 2 characters per token for English, 1 for CJK
if is_cjk(text):
return len(text) // 2
return len(text) // 4
Files Affected
nvidia-ai-gateway.py (line ~330-350)
References
Bug Description
The gateway uses a naive token estimation based on character count:
This is a medium severity issue because:
Severity
Recommended Fix
tiktokenlibrary for accurate token counting:Files Affected
nvidia-ai-gateway.py(line ~330-350)References