Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tgi server :: tool_choice="auto" behaves like tool_choice="required" from OpenAI spec #2549

Open
2 of 4 tasks
mottoslo opened this issue Sep 23, 2024 · 6 comments
Open
2 of 4 tasks
Assignees

Comments

@mottoslo
Copy link

mottoslo commented Sep 23, 2024

System Info

tgi version : 2.3.0
model : Meta-Llama-3-8B-Instruct

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

0. tool definition to use for reproduction

weather_tool = {
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a specified city with specified measure",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, always seoul"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use."
                }
            },
            "required": ["location", "format"]
        }
    }
}

1. Using OpenAI with tool_choice="auto"

api_key="[OPENAI_API_KEY]"

client = OpenAI(
    api_key=api_key
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]


chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response)
ChatCompletionMessage(content='Hello there! 🌞 How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None)

=> responds with normal chat message since prompt does not need tool_call

2. Using tgi with tool_choice="auto" (model = llama)

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1/",
    api_key="dummy_key"
)

messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]


chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="auto",
    stream=False
)
print(chat_response.choices[0].message)
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'format': 'celsius', 'location': 'Seoul'}, name='get_current_weather', description=None), type='function')])

=> tries to call a function anyway

3. Using OpenAI with tool_choice="required"

api_key="[OPENAI_API_KEY]"
client = OpenAI(
    api_key=api_key
)


messages = [
    {
        "role": "system",
        "content": "You're a helpful assistant! Use tools if necessary",
    },
    {
        "role": "user",
        "content": "just respond with a warm greeting"
    }
]


chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[weather_tool],
    tool_choice="required",
    stream=False
)
print(chat_response.choices[0].message)
ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_0ZyaXEb9hIIQbJybYNlPjRVe', function=Function(arguments='{"location": "seoul", "format": "celsius"}', name='get_current_weather'), type='function')])

=> tries to call a function anyway

Expected behavior

When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.

I am curious if the above behavior is intended. I have found that someone has raised this issue, (#1587 (comment)) but it wasn't addressed anywhere.

Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ?
If this behavior is not intended and needs fixing, I would love to give it a shot !
Thank you :)

@mottoslo
Copy link
Author

gentle ping @drbh
is this issue being handled internally ? any feedback would be great !

@Simon-Stone
Copy link

I am running into this issue as well. I am not knowledgable enough in rust to deal with this, but I would very much appreciate if you take this on @mottoslo !

@mottoslo
Copy link
Author

mottoslo commented Oct 25, 2024

I am running into this issue as well. I am not knowledgable enough in rust to deal with this, but I would very much appreciate if you take this on @mottoslo !

I think handling this issue may involve (breaking) changes in feature and needs to be discussed beforehand, hence I do not know where to start.
However, some pull requests have been opened since ( #2645 #2614 ... ) that I think are related to this, so I assume there's an internal consensus on how things should be done ?

@Simon-Stone
Copy link

Simon-Stone commented Oct 25, 2024

Either way, it would be a huge improvement. As it stands, we can't easily build agents based on models deployed with TGI because of this. At least not using the Messages API. I tried manually applying the chat template and using the generate endpoints, and the model appears to be able to choose not to use a tool. The downside of this approach is that the manual chat template handling makes it much harder to integrate in existing frameworks. Being able to use TGI as a drop-in replacement for OpenAI models would be fantastic.

@Simon-Stone
Copy link

#2614 has been merged into main and is part of the latest release. Has anyone already had a chance to test if this solves the issue?

@Johnno1011
Copy link

Hey guys, seen the PRs related to this that were in the recent release. It doesn't look like this has fixed the issue for me. I'm using HuggingFaceEndpoint wrapped up inside ChatHuggingFace to try and do ToolCalling with langchain. I am finding that the LLM will always cool that one tool and never stop in order to respond when it has the information it needs. So, it appears to me that it's still behaving as if tool_choice='required'.
Has anyone else had any success with this / is experiencing the same.

A general side note, I find using langchain for anything absolutely horrific but it's the easiest way out there. I would have preferred to suffer less and use the OpenAI classes from langchain to do this (rather than the chathuggingface, hfendpoint classes) but it seems that the API for TGI is not yet fully married with that which OpenAI uses.
Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants