Description
System Info
common in all platform
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
text-generation-launcher --model-id=llava-hf/llava-v1.6-mistral-7b-hf --max-input-tokens 4096 --max-batch-prefill-tokens 16384 --max-total-tokens 8192 --max-batch-size 4
client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:80/v1", api_key="-")
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
},
},
{"type": "text", "text": "Whats in this image?"},
],
},
],
max_tokens=50,
temperature=0.0,
stream=False,
)
print(chat_completion)
Expected behavior
incorrect output is
ChatCompletion(id='', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=" I'm sorry, but I'm not sure what you're asking. Can you please provide more context or information about what you're looking for? ", refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))], created=1749197214, model='llava-hf/llava-v1.6-mistral-7b-hf', object='chat.completion', service_tier=None, system_fingerprint='3.3.1-dev0-native', usage=CompletionUsage(completion_tokens=35, prompt_tokens=8, total_tokens=43, completion_tokens_details=None, prompt_tokens_details=None))