tgi server :: tool_choice="auto" behaves like tool_choice="required" from OpenAI spec

### System Info

tgi version : 2.3.0
model : Meta-Llama-3-8B-Instruct

### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [ ] My own modifications

### Reproduction

**0. tool definition to use for reproduction**

```python3
weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a specified city with specified measure",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, always seoul"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use."
                }
            },
            "required": ["location", "format"]
        }
    }
}
```

**1. Using OpenAI with tool_choice="auto"**

```python3
api_key="[OPENAI_API_KEY]"

client = OpenAI(
    api_key=api_key
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]


chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response)
```
```
ChatCompletionMessage(content='Hello there! 🌞 How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)
```
=> responds with normal chat message since prompt does not need tool_call

**2. Using tgi with tool_choice="auto"  (model = llama)**
```python3
client = OpenAI(
    base_url="http://127.0.0.1:8080/v1/",
    api_key="dummy_key"
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]


chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response.choices[0].message)
```
```
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'format': 'celsius', 'location': 'Seoul'}, name='get_current_weather', description=None), type='function')])
```
=> tries to call a function anyway

**3. Using OpenAI with tool_choice="required"**
```python3
api_key="[OPENAI_API_KEY]"
client = OpenAI(
    api_key=api_key
)


messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]


chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="required",
    stream=False
)
print(chat_response.choices[0].message)
```
```
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_0ZyaXEb9hIIQbJybYNlPjRVe', function=Function(arguments='{"location": "seoul", "format": "celsius"}', name='get_current_weather'), type='function')])
```
=> tries to call a function anyway

### Expected behavior

When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.

I am curious if the above behavior is intended. I have found that someone has raised this issue, (https://github.com/huggingface/text-generation-inference/pull/1587#issuecomment-1979185339) but it wasn't addressed anywhere.

Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ?
If this behavior is not intended and needs fixing, **I would love to give it a shot !**
Thank you :)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tgi server :: tool_choice="auto" behaves like tool_choice="required" from OpenAI spec #2549

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tgi server :: tool_choice="auto" behaves like tool_choice="required" from OpenAI spec #2549

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions